基于GhostNet与注意力机制的YOLOv5交通目标检测
作者:
基金项目:

青海大学中青年基金(2019-QGY-15)


YOLOv5 Traffic Object Detection Based on GhostNet and Attention Mechanism
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [30]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对交通目标检测模型参数量大、检测精度低、检测速度慢、泛化性差等问题, 提出一种基于GhostNet与注意力机制的YOLOv5交通目标实时检测模型. 采用基于遗传算法的K-means聚类方法获取适用于车辆检测的最佳预选框; 采用轻量的Ghost卷积提取目标特征, 并构建基于CSP结构的C3Ghost模块, 大幅度压缩模型参数量, 降低计算成本, 提高计算速度; 在特征融合层添加Transformer block和CBAM注意力模块, 来探索模型特征提取潜力以及为模型在密集对象的场景中寻找注意力区域; UA-DETRAC数据集上的消融实验和综合性能评价结果表明所提模型平均精度达到98.68%, 参数量为47 M, 检测速度为65 FPS, 与YOLOv5相比, 参数量压缩了34%, 速度提升43%, 平均精度提高了1.05%.

    Abstract:

    Traffic object detection models have massive parameters, low detection accuracy and speed, and poor generalization. In view of these problems, YOLOv5 real-time traffic object detection model based on GhostNet and attention mechanism is proposed. The K-means clustering method based on genetic algorithms is used to obtain the best prior bounding box suitable for vehicle detection. The lightweight GhostConv is used to extract target features, and the C3Ghost module based on the CSP structure is constructed, which can greatly reduce the number of model parameters, reduce the calculation cost, and improve the calculation speed. Transformer block and CBAM attention module are added in the feature fusion layer to explore the potential of feature extraction of the model and find attention regions for the model in scenarios with dense objects. The results of ablation experiments and comprehensive performance evaluation on the UA-DETRAC data set show that the average accuracy of the proposed model reaches 98.68%, the number of parameters is 47 M, and the detection speed is 65 FPS. Compared with YOLOv5, the number of parameters is reduced by 34%, the speed is increased by 43%, and the average accuracy is increased by 1.05%.

    参考文献
    [1] Hadi RA, Sulong G, George LE. Vehicle detection and tracking techniques: A concise review. arXiv:1410.5894, 2014.
    [2] Zivkovic Z, van der Heijden F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 2006, 27(7): 773–780. [doi: 10.1016/j.patrec.2005.11.005
    [3] Godbehere AB, Matsukawa A, Goldberg K. Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. Proceedings of the 2012 American Control Conference. Montreal: IEEE, 2012. 4305–4312.
    [4] Farnebäck G. Two-frame motion estimation based on polynomial expansion. Proceedings of the 13th Scandinavian Conference on Image Analysis. Halmstad: Springer, 2003. 363–370.
    [5] Lucas BD, Kanade T. An iterative image registration technique with an application to stereo vision. Proceedings of the 7th International Joint Conference on Artificial Intelligence. Vancouver: ACM, 1981. 674–679.
    [6] Dalal N, Triggs B. Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. 886–893.
    [7] Ma XX, Grimson WEL. Edge-based rich representation for vehicle classification. Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing: IEEE, 2005. 1185–1192.
    [8] Papageorgiou CP, Oren M, Poggio T. A general framework for object detection. Proceedings of the 6th International Conference on Computer Vision. Bombay: IEEE, 1998. 555–562.
    [9] Kazemi FM, Samadi S, Poorreza HR, et al. Vehicle recognition using curvelet transform and SVM. Proceedings of the 4th International Conference on Information Technology. Las Vegas: IEEE, 2007. 516–521.
    [10] Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1): 119–139. [doi: 10.1006/jcss.1997.1504
    [11] Girshick R. Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015. 1440–1448.
    [12] 陶颖军. 基于OpenCV的人脸识别应用. 计算机系统应用, 2012, 21(3): 220–223. [doi: 10.3969/j.issn.1003-3254.2012.03.051
    [13] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031
    [14] 黄继鹏, 史颖欢, 高阳. 面向小目标的多尺度Faster-RCNN检测算法. 计算机研究与发展, 2019, 56(2): 319–327. [doi: 10.7544/issn1000-1239.2019.20170749
    [15] 陈飞, 章东平. 基于多尺度特征融合的Faster-RCNN道路目标检测. 中国计量大学学报, 2018, 29(4): 393–397. [doi: 10.3969/j.issn.2096-2835.2018.04.008
    [16] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6517–6525.
    [17] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot MultiBox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 21–37.
    [18] 孟乔. 面向复杂场景的车辆检测跟踪及行为分析关键技术研究[博士学位论文]. 西安: 长安大学, 2021.
    [19] 张新宇, 丁胜, 杨治佩. 基于改进注意力机制的交通标志检测算法. 计算机应用, 2022, 42(8): 2378–2385. [doi: 10.11772/j.issn.1001-9081.2021061005
    [20] Han K, Wang YH, Tian Q, et al. GhostNet: More features from cheap operations. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 1577–1586.
    [21] Jocher G. YOLOv5. https://github.com/ultralytics/yolov5. (2020-08-09).
    [22] Wang CY, Liao HYM, Wu YH, et al. CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020. 1571–1580.
    [23] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021. 1–10.
    [24] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 3–19.
    [25] Wen LY, Du DW, Cai ZW, et al. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding, 2020, 193: 102907. [doi: 10.1016/j.cviu.2020.102907
    [26] 赖玉霞, 刘建平, 杨国兴. 基于遗传算法的K均值聚类分析. 计算机工程, 2008, 34(20): 200–202. [doi: 10.3969/j.issn.1000-3428.2008.20.073
    [27] Ghiasi G, Lin TY, Le QV. DropBlock: A regularization method for convolutional networks. Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: ACM, 2018. 10750–10760.
    [28] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 936–944.
    [29] Liu S, Qi L, Qin HF, et al. Path aggregation network for instance segmentation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8759–8768.
    [30] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1314–1324.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

皇甫俊逸,孟乔,孟令辰,谢宇鹏.基于GhostNet与注意力机制的YOLOv5交通目标检测.计算机系统应用,2023,32(4):149-160

复制
分享
文章指标
  • 点击次数:1321
  • 下载次数: 2900
  • HTML阅读次数: 3205
  • 引用次数: 0
历史
  • 收稿日期:2022-09-20
  • 最后修改日期:2022-10-19
  • 在线发布日期: 2023-02-24
文章二维码
您是第11204427位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号