基于YOLOx残差块融合CoA模块的改进检测网络
作者:

Improved Detection Network Based on YOLOx Residual Block Fusion CoA Module
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [17]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    YOLOx-Darknet53是以YOLOv3为基准增加各种技巧(trick)升级改进的检测网络, 但其仍然是以Darknet53为特征提取骨干网络(backbone), 因此网络的特征提取能力仍有欠缺. 本文依据CoTNet中的注意力机制改进得到CoA (contextual attention)模块, 并将其替代YOLOx骨干网络残差块里的3×3卷积, 得到融合注意力后的新残差块, 加强了骨干网络的特征提取能力, 并在Pascal VOC2007数据集上进行对比实验, 融合CoA模块的网络比原网络的平均精度均值AP@[.5:.95]高1.4, AP@0.5高1.4; 在改进骨干网络后的YOLOx检测头前加入无参3D注意力模块, 得到最终改进的检测网络, 进行上述对比实验, 结果表明比原网络的AP@[.5:.95]高1.6, AP@0.5高1.5. 因此, 改进后的网络比原网络检测更加精准, 在工业应用中能达到更好的检测效果.

    Abstract:

    YOLOx-Darknet53 is an improved detection network integrating a basis of you only look once version 3 (YOLOv3) with various tricks added. Nevertheless, it still uses Darknet53 as the backbone network to extract features, so the feature extraction capability of the network is still insufficient. In this study, we acquire a contextual attention (CoA) module by improving the attention mechanism in CoTNet and replace the 3×3 convolution in the residual block of the YOLOx backbone network with the module to obtain a new residual block after attention fusion and thereby strengthen the feature extraction capability of the backbone network. A comparison experiment is conducted on the Pascal VOC2007 data set. The mean average precision AP@[.5:.95] and the AP@0.5 of the network integrating the CoA module are both 1.4 higher than those of the original network. After the backbone network is improved, a non-parameter 3D attention module is added in front of the YOLOx detection head to obtain the final improved detection network. The results of another round of the above comparative experiment show that the AP@[.5:.95] and the AP@0.5 of the final network are respectively 1.6 and 1.5 higher than those of the original network. Therefore, the improved network is more accurate than the original network in detection and can achieve better detection effects in industrial applications.

    参考文献
    [1] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2012. 1097–1105.
    [2] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [3] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.
    [4] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015. 91–99.
    [5] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot MultiBox detector. 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 21–37.
    [6] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 779–788.
    [7] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018.
    [8] 王振, 邓三鹏, 祁宇明, 等. 基于YOLOv3的钢轨螺栓组件故障检测方法. 机器人技术与应用, 2021, (1): 34–36. [doi: 10.3969/j.issn.1004-6437.2021.01.009
    [9] Ge Z, Liu ST, Wang F, et al. YOLOX: Exceeding YOLO series in 2021. arXiv: 2107.08430, 2021.
    [10] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [11] Li YH, Yao T, Pan YW, et al. Contextual transformer networks for visual recognition. arXiv: 2107.12292, 2021.
    [12] Yang LX, Zhang RY, Li LD, et al. SimAM: A simple, parameter-free attention module for convolutional neural networks. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 11863–11874.
    [13] Zhang HY, Cissé M, Dauphin YN, et al. mixup: Beyond empirical risk minimization. 6th International Conference on Learning Representations. Vancouver: OpenReview.net, 2018.
    [14] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255.
    [15] Ge Z, Liu ST, Li ZM, et al. OTA: Optimal transport assignment for object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 303–312.
    [16] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [17] Misra D. Mish: A self regularized non-monotonic activation function. 31st British Machine Vision Conference 2020. BMVA Press, 2020.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

安鹤男,杨佳洲,邓武才,管聪,马超.基于YOLOx残差块融合CoA模块的改进检测网络.计算机系统应用,2022,31(8):245-251

复制
分享
文章指标
  • 点击次数:678
  • 下载次数: 7729
  • HTML阅读次数: 2759
  • 引用次数: 0
历史
  • 收稿日期:2021-10-30
  • 最后修改日期:2021-11-29
  • 在线发布日期: 2022-04-18
文章二维码
您是第11208964位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号