基于多尺度特征加权融合注意力的密集人群计数算法
作者:

Dense Crowd Counting Algorithm Based on Multi-scale Feature Weighted Fusion Attention
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [40]
  • | | | |
  • 文章评论
    摘要:

    针对人群计数面临的人头尺寸不统一、人群密度分布不均匀、背景复杂干扰等问题, 提出一种解决多尺度变化加强关注人群区域的卷积神经网络模型 (multi-scale feature weighted fusion attention convolutional neural network, MSFANet). 该网络前端采用改进的VGG-16模型对输入人群图像做第1步的粗粒度特征提取, 中间加入多尺度特征提取模块提取图像的多尺度特征信息. 随后添加注意力模块对多尺度特征进行特征加权. 后端利用锯齿状空洞卷积模块增大感受野, 以提取图像的细节特征, 生成高质量的人群密度图. 对该模型在3个公开数据集上进行实验, 结果显示, 在Shanghai Tech Part B数据集上MAE (平均绝对误差)达到7.8, MSE (均方误差)达到12.5. 在Shanghai Tech Part A数据集上MAE达到64.9, MSE达到108.4. 在UCF_CC_50数据集上MAE达到185.1, MSE达到249.8. 实验结果证实该模型有较好的准确度和鲁棒性.

    Abstract:

    In response to challenges faced in crowd counting, such as non-uniform head sizes, uneven crowd density distribution, and complex background interference, a convolutional neural network (CNN) model (multi-scale feature weighted fusion attention convolutional neural network, MSFANet) that focuses on crowd regions and addresses multi-scale changes is proposed. The front end of the network adopts an improved VGG-16 model to perform the first step of coarse-grained feature extraction on the input crowd image. A multi-scale feature extraction module is added in the middle to extract the multi-scale feature information of the image. Then, an attention module is added to weigh the multi-scale features. At the back end, a sawtooth shaped dilated convolution module is adopted to increase the receptive field, extract the detailed features of the image, and generate high-quality crowd density maps. Experiments on this model are conducted on three public datasets. The results show that on the Shanghai Tech Part B dataset, the mean absolute error (MAE) is reduced to 7.8, and the mean squared error (MSE) decreases to 12.5. On the Shanghai Tech Part A dataset, the MAE is reduced to 64.9, and the MSE decreases to 108.4. On the UCF_CC_50 dataset, the MAE is reduced to 185.1, and the MSE decreases to 249.8. These experimental results affirm that the proposed model exhibits strong accuracy and robustness.

    参考文献
    [1] 余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述. 计算机研究与发展, 2021, 58(12): 2724–2747.
    [2] Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980, 36(4): 193–202.
    [3] Topkaya IS, Erdogan H, Porikli F. Counting people by clustering person detector outputs. Proceedings of the 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Seoul: IEEE, 2014. 313–318.
    [4] Laradji IH, Rostamzadeh N, Pinheiro PO, et al. Where are the blobs: Counting by localization with point supervision. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 560–576.
    [5] Liu YT, Shi MJ, Zhao QJ, et al. Point in, box out: Beyond counting persons in crowds. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 6462–6471.
    [6] Ryan D, Denman S, Sridharan S, et al. An evaluation of crowd counting methods, features and regression models. Computer Vision and Image Understanding, 2015, 130: 1–17.
    [7] Fu M, Xu P, Li XD, et al. Fast crowd density estimation with convolutional neural networks. Engineering Applications of Artificial Intelligence, 2015, 43: 81–88.
    [8] Zhang YY, Zhou DS, Chen SQ, et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 589–597.
    [9] Babu Sam D, Surya S, Venkatesh Babu R. Switching convolutional neural network for crowd counting. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 4031–4039.
    [10] Li YH, Zhang XF, Chen DM. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1091–1100.
    [11] Jiang XH, Zhang L, Xu ML, et al. Attention scaling for crowd counting. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 4705–4714.
    [12] 左静, 巴玉林. 基于多尺度融合的深度人群计数算法. 激光与光电子学进展, 2020, 57(24): 241502.
    [13] Gao JY, Wang Q, Yuan Y. SCAR: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing, 2019, 363: 1–8.
    [14] Song QY, Wang CA, Jiang ZK, et al. Rethinking counting and localization in crowds: A purely point-based framework. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 3345–3354.
    [15] Savner SS, Kanhangad V. CrowdFormer: Weakly-supervised crowd counting with improved generalizability. Journal of Visual Communication and Image Representation, 2023, 94: 103853.
    [16] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017. 6000–6010.
    [17] 余鹰, 潘诚, 朱慧琳, 等. 融合通道与空间注意力的编解码人群计数算法. 计算机科学与探索, 2022, 16(11): 2547–2556.
    [18] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.15 56v6, 2015.
    [19] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [20] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122v3, 2016.
    [21] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classifi- cation with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe: ACM, 2012. 1097–1105.
    [22] Xu ML, Ge ZY, Jiang XH, et al. Depth information guided crowd counting for complex crowd scenes. Pattern Recognition Letters, 2019, 125: 563–569.
    [23] Lin M, Chen Q, Yan SC. Network in network. arXiv:1312.4400v3, 2014.
    [24] 杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络. 计算机应用, 2021, 41(2): 537–543.
    [25] Yu T, Li X, Cai YF, et al. S2-MLP: Spatial-shift MLP architecture for vision. Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021. 3615–3624.
    [26] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [27] Hou QB, Zhou DQ, Feng JS. Coordinate attention for efficient mobile network design. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 13708–13717.
    [28] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 3–19.
    [29] Wang PQ, Chen PF, Yuan Y, et al. Understanding convolution for semantic segmentation. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe: IEEE, 2018. 1451–1460.
    [30] Idrees H, Saleemi I, Seibert C, et al. Multi-source multi-scale counting in extremely dense crowd images. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013. 2547–2554.
    [31] Sindagi VA, Patel VM. Generating high-quality crowd density maps using contextual pyramid CNNs. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 1879–1888.
    [32] 邹敏, 黄虹, 杜渂, 等. 基于特征融合编解码的人群计数和密度估计. 计算机工程与设计, 2023, 44(7): 2110–2117.
    [33] Chen K, Loy CC, Gong SG, et al. Feature mining for localised crowd counting. Proceedings of the 2012 British Machine Vision Conference. Surrey, 2012. 1–11.
    [34] Cao XK, Wang ZP, Zhao YY, et al. Scale aggregation network for accurate and efficient crowd counting. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 757–773.
    [35] Zhang C, Li H, Wang X, et al. Cross-scene crowd counting via deep convolutional neural networks. Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2015. 833–841.
    [36] Liu N, Long YC, Zou CQ, et al. ADCrowdnet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3220–3229.
    [37] Shen Z, Xu Y, Ni BB, et al. Crowd counting via adversarial cross-scale consistency pursuit. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5245–5254.
    [38] Jiang XL, Xiao ZH, Zhang BC, et al. Crowd counting and density estimation by trellis encoder-decoder networks. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 6126–6135.
    [39] 徐涛, 段仪浓, 杜佳浩, 等. 基于多尺度增强网络的人群计数方法. 电子与信息学报, 2021, 43(6): 1764–1771.
    [40] 赵佳彬, 徐慧英, 朱蓉, 等. 基于多尺度特征融合与背景抑制的MFFBSNet网络人群计数算法. 计算机工程与科学, 1–13. http://kns.cnki.net/kcms/detail/43.1258.TP.20240627.1334.002.html. (2024-06-29)[2024-08-27].
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

时东亮,葛艳,徐慕君.基于多尺度特征加权融合注意力的密集人群计数算法.计算机系统应用,2025,34(3):210-219

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-06
  • 最后修改日期:2024-08-27
  • 在线发布日期: 2024-12-09
文章二维码
您是第10784838位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号