结合注意机制和多尺度卷积的YOLO行人检测算法
作者:

YOLO Pedestrian Detection Algorithm with Attention Mechanism and Multi Convolution Kernel
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [18]
  • | | | |
  • 文章评论
    摘要:

    为提高行人检测的检测性能, 本文结合SqueezeNet、注意力机制、空洞卷积和Inception等结构, 提出一种基于改进YOLOv4的行人检测算法. 改进YOLO在特征增强部分引入残差连接和结合空洞卷积的注意力模块D-CBAM, 可以从提取到的特征中选择对目标检测重要的信息. 此外, 结合SqueezeNet的“squeeze- expand”结构和Inception网络的多尺度卷积思想提出Inception-fire模块用于替代网络中的连续卷积层, 通过增加网络的宽度达到提升算法性能的效果, 同时减少网络的参数. 最后, 根据行人检测任务的特点并结合Focal loss对损失函数进行改进, 分别对正负样本和难易样本添加权重因子, 强调对正样本和难分类样本的训练, 从而提高网络的检测能力. 改进的YOLO算法在INRIA行人数据集上的检测精度能够达到94.95%, 相对原YOLOv4提高4.25%, 同时参数量减少了36.35%, 检测速度也获得13.54%的提升, 在行人检测中能够表现出更优秀的性能.

    Abstract:

    To improve the pedestrian detection performance, this study proposes a pedestrian detection algorithm based on improved YOLOv4 by combining SqueezeNet, attention mechanism, dilated convolution and Inception structure. An attention module named D-CBAM is proposed which is combined with dilated convolution. It is introduced to the feature enhancement part to select useful information from the extracted features. The residual connection is also used in this part to enhance feature reusability. In addition, an Inception-fire module is proposed by combining the “squeeze-expand” structure of SqueezeNet and the multi-scale convolution kernel structure of Inception, which replaces the continuous convolution layer in the network. Increasing the width of the network improves the performance of the algorithm and reduces network parameters. According to the characteristics of pedestrian detection and focal loss, the loss function is improved. The detection ability is enhanced through the addition of weights to the positive and negative samples and the hard and easy samples respectively and the strengthening of the training on positive samples and hard samples. The detection accuracy of the improved YOLO algorithm on INRIA person data set can reach 94.95%, which is 4.25% higher than that of YOLOv4. The parameters of the model are reduced by 36.35%, and the detection speed is improved by 13.54%. In short, the improved algorithm shows better performance in pedestrian detection than YOLOv4.

    参考文献
    [1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.
    [2] Girshick R. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 1440–1448.
    [3] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031
    [4] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 779–788.
    [5] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [6] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 3–19.
    [7] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2014. 1–9.
    [8] Iandola FN, Moskewicz MW, Ashraf K, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. arXiv: 1602.07360, 2016. 1–13.
    [9] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017. 1–9.
    [10] Li Y, Lv C. SS-YOLO: An object detection algorithm based on YOLOv3 and ShuffleNet. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chongqing: IEEE, 2020. 769–772.
    [11] Fang W, Wang L, Ren PM. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access, 2020, 8: 1935–1944. [doi: 10.1109/ACCESS.2019.2961959
    [12] 姜建勇, 吴云, 龙慧云, 等. PD-Center Net: 基于Center Net的实时行人检测模型. 计算机工程, 2020: 1–9
    [13] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020. 1–17.
    [14] Wang CY, Liao HYM, Wu YH, et al. CSPNet: A new backbone that can enhance learning capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020. 1571–1580.
    [15] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018. 1–6.
    [16] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 770–778.
    [17] 宦海, 陈逸飞, 张琳, 等. 基于BR-YOLOv3目标检测算法改进. 计算机工程, 2020: 1–12
    [18] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. [doi: 10.1109/TPAMI.2018.2858826
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

孙家慧,葛华勇,张哲浩.结合注意机制和多尺度卷积的YOLO行人检测算法.计算机系统应用,2022,31(4):171-179

复制
分享
文章指标
  • 点击次数:880
  • 下载次数: 1976
  • HTML阅读次数: 2915
  • 引用次数: 0
历史
  • 收稿日期:2021-07-04
  • 最后修改日期:2021-07-30
  • 在线发布日期: 2022-03-22
文章二维码
您是第11325615位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号