融合双注意力机制的人群计数算法
作者:
基金项目:

山东省自然科学基金(ZR2021MF092)


Crowd Counting Algorithm Based on Dual Attention Mechanism
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [21]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对背景复杂、遮挡、人群分布不均等人群计数常见问题, 提出了一种结合联合损失的空间-通道双注意力机制卷积神经网络模型(joint loss-based space-channel dual attention network, JL-SCDANet). 该网络前端进行图像粗粒度特征提取, 中间加入空间注意力机制以及通道注意力机制突出图像重点区域, 后端使用可加大感受野且不丢失图像分辨率的空洞卷积提取深层二维特征. 此外, 该模型结合联合损失函数进行训练, 以增强模型的鲁棒性. 为了验证模型的改进效果, 在3个公共数据集(ShanghaiTech Part B、mall和UCF_CC_50)上分别进行了对比实验, 在ShanghaiTech Part B数据集中平均绝对误差(MAE)和均方误差(MSE)分别达到了8.13和13.13; 在mall数据集中MAEMSE达到了1.78和2.28; 在UCF_CC_50数据集中MAEMSE分别达到了182.12和210.24, 实验结果证明了该网络在提高人数统计准确率上的有效性.

    Abstract:

    Given the common problems of crowd counting with a complex background, occlusion, and uneven crowd distribution, a joint loss-based space-channel dual attention network (JL-SCDANet) is proposed. The front end of the network extracts coarse-grained features of an image, and the spatial attention mechanism and channel attention mechanism are added in the middle to highlight the key areas of the image, while the back end uses dilated convolution that can increase the receptive field without losing the image resolution to extract deep two-dimensional features. In addition, the model is trained with the joint loss function to enhance its robustness. Comparative experiments are carried out on three public data sets (i.e., ShanghaiTech Part B, mall, and UCF_CC_50) to verify the improvement effect of the model. In terms of the mean absolute error (MAE) and mean square error (MSE), the results on ShanghaiTech Part B, mall, and UCF_CC_50 reach 8.13 and 13.13, 1.78 and 2.28, and 182.12 and 210.24, respectively. The experimental results prove the effectiveness of the network in improving the accuracy of population statistics.

    参考文献
    [1] 蒋妮, 周海洋, 余飞鸿. 基于计算机视觉的目标计数方法综述. 激光与光电子学进展, 2021, 58(14):43-59
    [2] Zhang YY, Zhou DS, Chen SQ, et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE, 2016. 589-597.
    [3] 左静, 巴玉林. 基于多尺度融合的深度人群计数算法. 激光与光电子学进展, 2020, 57(24):307-315
    [4] Gao JY, Wang Q, Yuan Y. SCAR:Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing, 2019, 363:1-8.[doi:10.1016/j.neucom.2019.08.018
    [5] Xu ML, Ge ZY, Jiang XH, et al. Depth information guided crowd counting for complex crowd scenes. Pattern Recognition Letters, 2019, 125:563-569.[doi:10.1016/j.patrec.2019.02.026
    [6] 袁健, 王姗姗, 罗英伟. 基于图像视野划分的公共场所人群计数模型. 计算机应用研究, 2021, 38(4):1256-1260, 1280.[doi:10.19734/j.issn.1001-3695.2020.02.0076
    [7] Zou ZK, Cheng Y, Qu XY, et al. Attend to count:Crowd counting with adaptive capacity multi-scale CNNs. Neurocomputing, 2019, 367:75-83.[doi:10.1016/j.neucom.2019.08.009
    [8] Kong XY, Zhao MM, Zhou H, et al. Weakly supervised crowd-wise attention for robust crowd counting. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona:IEEE, 2020. 2722-2726.
    [9] 杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络. 计算机应用, 2021, 41(2):537-543.[doi:10.11772/j.issn.1001-9081.2020060793
    [10] 杨旭, 黄进, 秦泽宇, 等. 基于多尺度特征融合的人群计数算法. 计算机系统应用, 2022, 31(1):226-235.[doi:10.15888/j.cnki.csa.008250
    [11] 沈宁静, 袁健. 基于残差密集连接与注意力融合的人群计数算法. 电子科技, 2022, 35(6):6-12
    [12] Wang FS, Sang J, Wu ZY, et al. Hybrid attention network based on progressive embedding scale-context for crowd counting. Information Sciences, 2022, 591:306-318.[doi:10.1016/j.ins.2022.01.046
    [13] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324.[doi:10.1109/5.726791
    [14] Li YH, Zhang XF, Chen DM. CSRNet:Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018. 1091-1100.
    [15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
    [16] Zhang Q, Cui ZP, Niu XG, et al. Image segmentation with pyramid dilated convolution based on ResNet and U-Net. Proceedings of the 24th International Conference on Neural Information Processing. Guangzhou:Springer, 2017. 364-372.
    [17] 庄福振, 罗平, 何清, 等. 迁移学习研究进展. 软件学报, 2015, 26(1):26-39.[doi:10.13328/j.cnki.jos.004631
    [18] Woo S, Park J, Lee JY, et al. CBAM:Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision. Munich:Springer, 2018. 3-19.
    [19] Wu HP, Zou ZX, Gui J, et al. Multi-grained attention networks for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(2):512-522.[doi:10.1109/TCSVT.2020.2988895
    [20] Chen K, Loy CC, Gong SG, et al. Feature mining for localised crowd counting. Proceedings of the British Machine Vision Conference. Surrey:BMVA Press, 2012. 1-11.
    [21] Idrees H, Saleemi I, Seibert C, et al. Multi-source multi-scale counting in extremely dense crowd images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 2547-2554.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐晓晨,葛艳,杜军威,陈卓.融合双注意力机制的人群计数算法.计算机系统应用,2023,32(1):241-248

复制
分享
文章指标
  • 点击次数:783
  • 下载次数: 2055
  • HTML阅读次数: 1645
  • 引用次数: 0
历史
  • 收稿日期:2022-05-16
  • 最后修改日期:2022-06-15
  • 在线发布日期: 2022-11-14
文章二维码
您是第12822711位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号