结合注意机制和多尺度卷积的YOLO行人检测算法

doi:10.15888/j.cnki.csa.008427

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月10日 20:29 星期四

首页 > 过刊浏览>2022年第31卷第4期 >171-179. DOI:10.15888/j.cnki.csa.008427

PDF HTML阅读 XML下载导出引用引用提醒

结合注意机制和多尺度卷积的YOLO行人检测算法
DOI:
                        10.15888/j.cnki.csa.008427
                    
CSTR:
                        
                    
作者:
                        孙家慧孙家慧
东华大学 信息科学与技术学院, 上海 201620
在期刊界中查找
在百度中查找
在本站中查找
葛华勇葛华勇
东华大学 信息科学与技术学院, 上海 201620
在期刊界中查找
在百度中查找
在本站中查找
张哲浩张哲浩
东华大学 信息科学与技术学院, 上海 201620
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

YOLO Pedestrian Detection Algorithm with Attention Mechanism and Multi Convolution Kernel

Author:

SUN Jia-Hui
SUN Jia-Hui
School of Information Science and Technology, Donghua University, Shanghai 201620, China
在期刊界中查找
在百度中查找
在本站中查找
GE Hua-Yong
GE Hua-Yong
School of Information Science and Technology, Donghua University, Shanghai 201620, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Zhe-Hao
ZHANG Zhe-Hao
School of Information Science and Technology, Donghua University, Shanghai 201620, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [18]

相似文献

引证文献

资源附件

文章评论

摘要:

为提高行人检测的检测性能, 本文结合SqueezeNet、注意力机制、空洞卷积和Inception等结构, 提出一种基于改进YOLOv4的行人检测算法. 改进YOLO在特征增强部分引入残差连接和结合空洞卷积的注意力模块D-CBAM, 可以从提取到的特征中选择对目标检测重要的信息. 此外, 结合SqueezeNet的“squeeze- expand”结构和Inception网络的多尺度卷积思想提出Inception-fire模块用于替代网络中的连续卷积层, 通过增加网络的宽度达到提升算法性能的效果, 同时减少网络的参数. 最后, 根据行人检测任务的特点并结合Focal loss对损失函数进行改进, 分别对正负样本和难易样本添加权重因子, 强调对正样本和难分类样本的训练, 从而提高网络的检测能力. 改进的YOLO算法在INRIA行人数据集上的检测精度能够达到94.95%, 相对原YOLOv4提高4.25%, 同时参数量减少了36.35%, 检测速度也获得13.54%的提升, 在行人检测中能够表现出更优秀的性能.

关键词:YOLOv4;注意力机制;SqueezeNet;Inception;ResNet;焦点损失;深度学习;目标检测

Abstract:

To improve the pedestrian detection performance, this study proposes a pedestrian detection algorithm based on improved YOLOv4 by combining SqueezeNet, attention mechanism, dilated convolution and Inception structure. An attention module named D-CBAM is proposed which is combined with dilated convolution. It is introduced to the feature enhancement part to select useful information from the extracted features. The residual connection is also used in this part to enhance feature reusability. In addition, an Inception-fire module is proposed by combining the “squeeze-expand” structure of SqueezeNet and the multi-scale convolution kernel structure of Inception, which replaces the continuous convolution layer in the network. Increasing the width of the network improves the performance of the algorithm and reduces network parameters. According to the characteristics of pedestrian detection and focal loss, the loss function is improved. The detection ability is enhanced through the addition of weights to the positive and negative samples and the hard and easy samples respectively and the strengthening of the training on positive samples and hard samples. The detection accuracy of the improved YOLO algorithm on INRIA person data set can reach 94.95%, which is 4.25% higher than that of YOLOv4. The parameters of the model are reduced by 36.35%, and the detection speed is improved by 13.54%. In short, the improved algorithm shows better performance in pedestrian detection than YOLOv4.

Key words:YOLOv4;attention mechanism;SqueezeNet;inception;ResNet;focal loss;deep learning;target detection

参考文献

[1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.

[2] Girshick R. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 1440–1448.

[3] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031

[4] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 779–788.

[5] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.

[6] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 3–19.

[7] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2014. 1–9.

[8] Iandola FN, Moskewicz MW, Ashraf K, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. arXiv: 1602.07360, 2016. 1–13.

[9] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017. 1–9.

[10] Li Y, Lv C. SS-YOLO: An object detection algorithm based on YOLOv3 and ShuffleNet. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chongqing: IEEE, 2020. 769–772.

[11] Fang W, Wang L, Ren PM. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access, 2020, 8: 1935–1944. [doi: 10.1109/ACCESS.2019.2961959

[12] 姜建勇, 吴云, 龙慧云, 等. PD-Center Net: 基于Center Net的实时行人检测模型. 计算机工程, 2020: 1–9

[13] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020. 1–17.

[14] Wang CY, Liao HYM, Wu YH, et al. CSPNet: A new backbone that can enhance learning capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020. 1571–1580.

[15] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018. 1–6.

[16] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 770–778.

[17] 宦海, 陈逸飞, 张琳, 等. 基于BR-YOLOv3目标检测算法改进. 计算机工程, 2020: 1–12

[18] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. [doi: 10.1109/TPAMI.2018.2858826

引用本文

孙家慧,葛华勇,张哲浩.结合注意机制和多尺度卷积的YOLO行人检测算法.计算机系统应用,2022,31(4):171-179

复制

文章指标

点击次数:880
下载次数: 1976
HTML阅读次数: 2915
引用次数: 0

历史

收稿日期:2021-07-04
最后修改日期:2021-07-30
录用日期:
在线发布日期: 2022-03-22
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码