本文已被:浏览 931次 下载 1766次
Received:March 22, 2021 Revised:April 19, 2021
Received:March 22, 2021 Revised:April 19, 2021
中文摘要: 针对密集场景下人群目标尺度变化大导致识别精度不高的问题,本文提出两种多尺度特征融合结构:注意力加权融合模块(attention-weighted fusion module,AWF)和自底向上融合模块(bottom-up fusion module,BUF).其中AWF模块引入注意力分支学习特征图的权重,并将加权后的多层尺度特征进行叠加.而BUF模块在处理特征图时使用空洞卷积捕获更多尺度信息,且浅层特征图采用拼接方式融合.经过融合模块处理的特征图具有更强的表达能力,预测的密度图更加精准.本文算法以ResNet50为骨干网络提取特征,分别使用AWF和BUF模块进行特征融合,在公开数据集上进行实验.结果显示加入AWF模块的计数算法在Shanghai Tech数据集上的平均绝对误差(MAE)降到45.54(A部分)和7.6(B部分),均方误差(MSE)降到100.28(A部分)和11.4(B部分),在UCF_CC_50数据集上的MAE和MSE则降至212.42和323.06.而加入BUF模块后的算法在Shanghai Tech数据集上的MAE则为51.6(A部分)、8.0(B部分),MSE降到102(A部分)和12.8(B部分),在UCF_CC_50数据集上的MAE和MSE为242.6和359.5.实验结果表明,本文提出的AWF模块和BUF模块都可以有效融合深层与浅层的特征信息,优化特征图,提高计数精度.
Abstract:To tackle the problem of poor recognition accuracy caused by large changes of crowd target feature in a high-density scenario, this study proposes two kinds of multi-scale feature fusion structures: attention-weighted fusion module (AWF) and bottom-up fusion module (BUF). The AWF module uses the attention branch to learn the weights of feature maps, and the weighted multi-scale features are superposed finally. The BUF module uses dilated convolution to obtain more scale information during feature processing, and the shallow feature maps are merged by stitching. The processed feature map has stronger expressive ability, and the predicted density map is more accurate. Taking ResNet50 as the backbone network for feature extraction, the algorithm presented in this study uses AWF and BUF modules for feature fusion respectively, and experiments are conducted on public datasets. The results show that the crowd counting algorithm with the AWF module can reduce the mean absolute error (MAE) to 45.54 (part A) and 7.6 (part B) and the mean square error (MSE) to 100.28 (part A) and 11.4 (part B) on the Shanghai Tech dataset. On the UCF_CC_50 dataset, the MAE and MSE are decreased to 212.42 and 323.06, respectively. Regarding the algorithm with the BUF module, the MAE is reduced to 51.6 (part A) and 8.0 (part B), and the MSE is decreased to 102 (part A) and 12.8 (part B) on the Shanghai Tech dataset. On the UCF_CC_50 dataset, the MAE and MSE are decreased to 242.6 and 359.5, respectively. Experiments indicate that the AWF module and BUF module can both effectively integrate deep and shallow feature information, thus able to optimize feature maps and improve counting accuracy.
keywords: crowd counting multi-scale information feature fusion attention-weighted fusion dilated convolution
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61733015);高铁联合基金(U1934204);四川省重点研发计划(2020YFQ0057)
引用文本:
杨旭,黄进,秦泽宇,郑思宇,付国栋.基于多尺度特征融合的人群计数算法.计算机系统应用,2022,31(1):226-235
YANG Xu,HUANG Jin,QIN Ze-Yu,ZHENG Si-Yu,FU Guo-Dong.Crowd Counting Algorithm Based on Multi-scale Feature Fusion.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):226-235
杨旭,黄进,秦泽宇,郑思宇,付国栋.基于多尺度特征融合的人群计数算法.计算机系统应用,2022,31(1):226-235
YANG Xu,HUANG Jin,QIN Ze-Yu,ZHENG Si-Yu,FU Guo-Dong.Crowd Counting Algorithm Based on Multi-scale Feature Fusion.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):226-235