基于Transformer的跨尺度交互学习伪装目标检测

doi:10.15888/j.cnki.csa.009395

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年8月11日 2:57 星期一

首页 > 过刊浏览>2024年第33卷第2期 >115-124. DOI:10.15888/j.cnki.csa.009395

PDF HTML阅读 XML下载导出引用引用提醒

基于Transformer的跨尺度交互学习伪装目标检测
DOI:
                        10.15888/j.cnki.csa.009395
                    
CSTR:
                        32024.14.csa.009395
                    
作者:
                        李建东李建东
辽宁工程技术大学 软件学院, 葫芦岛 125105
在期刊界中查找
在百度中查找
在本站中查找
王岩王岩
辽宁工程技术大学 软件学院, 葫芦岛 125105
在期刊界中查找
在百度中查找
在本站中查找
曲海成曲海成
辽宁工程技术大学 软件学院, 葫芦岛 125105
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金面上项目(42271409); 辽宁省高等学校基本科研项目(LIKMZ20220699)

Transformer-based Cross Scale Interactive Learning for Camouflage Object Detection

Author:

LI Jian-Dong
LI Jian-Dong
School of Software, Liaoning Technical University, Huludao 125105, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Yan
WANG Yan
School of Software, Liaoning Technical University, Huludao 125105, China
在期刊界中查找
在百度中查找
在本站中查找
QU Hai-Cheng
QU Hai-Cheng
School of Software, Liaoning Technical University, Huludao 125105, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [32]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

伪装目标检测(COD)旨在精确且高效地检测出与背景高度相似的伪装物体, 其方法可为物种保护、医学病患检测和军事监测等领域提供助力, 具有较高的实用价值. 近年来, 采用深度学习方法进行伪装目标检测成为一个比较新兴的研究方向. 但现有大多数COD算法都是以卷积神经网络(CNN)作为特征提取网络, 并且在结合多层次特征时, 忽略了特征表示和融合方法对检测性能的影响. 针对基于卷积神经网络的伪装目标检测模型对被检测目标的全局特征提取能力较弱问题, 提出一种基于Transformer的跨尺度交互学习伪装目标检测方法. 该模型首先提出了双分支特征融合模块, 将经过迭代注意力的特征进行融合, 更好地融合高低层特征; 其次引入了多尺度全局上下文信息模块, 充分联系上下文信息增强特征; 最后提出了多通道池化模块, 能够聚焦被检测物体的局部信息, 提高伪装目标检测准确率. 在CHAMELEON、CAMO以及COD10K数据集上的实验结果表明, 与当前主流的伪装物体检测算法相比较, 该方法生成的预测图更加清晰, 伪装目标检测模型能取得更高精度.

关键词:深度学习;伪装目标检测;视觉特征金字塔;卷积神经网络;特征融合

Abstract:

Camouflage object detection (COD) aims to accurately and efficiently detect camouflaged objects that are highly similar to the background. Its method can assist in species protection, medical patient detection, and military monitoring, possessing high practical value. In recent years, using deep learning methods to detect camouflaged objects has become an emerging research direction. However, most existing COD algorithms apply a convolutional neural network (CNN) as the feature extraction network and ignore the influence of feature representation and fusion methods on detection performance when combining multi-level features. As the camouflage object detection model based on CNN has a weak ability to extract the global features of the detected object, this study proposes a cross scale interactive learning method for camouflage object detection based on Transformer. The model first puts forward a dual branch feature fusion module, which fuses features that have undergone iterative attention to better fuse high- and low-level features. Secondly, a multi-scale global context information module is introduced to fully integrate context information to enhance features. Finally, a multi-channel pooling module is proposed, which can focus on the local information of the detected object and improve the accuracy of camouflage target detection. The experimental results on the CHAMELEON, CAMO, and COD10K datasets show that this method generates clearer prediction maps and can achieve higher accuracy in camouflage object detection models than current mainstream camouflage object detection algorithms.

Key words:deep learning;camouflage object detection (COD);visual characteristic pyramid;convolutional neural network (CNN);feature fusion

参考文献

[1] Zhang X, Zhu C, Wang S, et al. A Bayesian approach to camouflaged moving object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(9): 2001–2013.

[2] Beiderman Y, Teicher M, Garcia J, et al. Optical technique for classification, recognition and identification of obscured objects. Optics Communications, 2010, 283(21): 4274–4282.

[3] Galun M, Sharon E, Basri R, et al. Texture segmentation by multiscale aggregation of filter responses and shape elements. Proceedings of the 9th IEEE International Conference on Computer Vision. Nice: IEEE, 2003. 716–723.

[4] Guo HX, Dou YL, Tian T, et al. A robust foreground segmentation method by temporal averaging multiple video frames. Proceedings of the 2008 International Conference on Audio, Language and Image Processing. Shanghai: IEEE, 2008. 878–882.

[5] Chaduvula K, Rao BP, Govardhan A. An efficient content based image retrieval using color and texture of image sub- blocks. International Journal of Engineering Science and Technology, 2020, 3(2): 1060–1068.

[6] Hall JR, Cuthill IC, Baddeley R, et al. Camouflage, detection and identification of moving targets. Proceedings of the Royal Society B: Biological Sciences, 2013, 280(1758): 20130064.

[7] Sun YJ, Chen G, Zhou T, et al. Context-aware cross-level fusion network for camouflaged object detection. Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal: IJCAI.org, 2021. 1025–1031.

[8] Mei HY, Ji GP, Wei ZQ, et al. Camouflaged object segmentation with distraction mining. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8772–8781.

[9] Fan DP, Ji GP, Cheng MM, et al. Concealed object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6024–6042.

[10] Dai YM, Gieseke F, Oehmcke S, et al. Attentional feature fusion. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021. 3560–3569.

[11] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.

[12] Wang WH, Xie EZ, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 548–558.

[13] Wang WH, Xie EZ, Li X, et al. PVT v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 2022, 8(3): 415–424.

[14] Liu ST, Huang D, Wang YH. Receptive field block net for accurate and fast object detection. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 385–400.

[15] He KM, Zhang XY, Ren SQ, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916.

[16] Fan DP, Ji GP, Sun GL, et al. Camouflaged object detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2774–2784.

[17] Wu Z, Su L, Huang QM. Cascaded partial decoder for fast and accurate salient object detection. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3907–3916.

[18] Wang K, Bi HB, Zhang Y, et al. D²C-Net: A dual-branch, dual-guidance and cross-refine network for camouflaged object detection. IEEE Transactions on Industrial Electronics, 2022, 69(5): 5364–5374.

[19] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.

[20] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848.

[21] Li WJ, Zhang ZY, Wang XJ, et al. AdaX: Adaptive gradient descent with exponential long term memory. arXiv:2004.09740, 2020.

[22] Fan DP, Cheng MM, Liu Y, et al. Structure-measure: A new way to evaluate foreground maps. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 4548–4557.

[23] Fan DP, Gong C, Gao Y, et al. Enhanced-alignment measure for binary foreground map evaluation. Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI.org, 2018. 698–704.

[24] Zhao JX, Liu JJ, Fan DP, et al. EGNet: Edge guidance network for salient object detection. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 8779–8788.

[25] Fan DP, Ji GP, Zhou T, et al. PraNet: Parallel reverse attention network for polyp segmentation. Proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima: Springer, 2020. 263–273.

[26] Zhong YJ, Li B, Tang L, et al. Detecting camouflaged object in frequency domain. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4504–4513.

[27] Yang F, Zhai Q, Li X, et al. Uncertainty-guided transformer reasoning for camouflaged object detection. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 4146–4155.

[28] Wang HW, Wang XZ, Sun FC, et al. Camouflaged object segmentation with transformer. Proceedings of the 6th International Conference on Cognitive Systems and Signal Processing. Suzhou: Springer, 2022. 225–237.

[29] Liu N, Zhang N, Wan KY, et al. Visual saliency transformer. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 4722–4732.

[30] Zhuge MC, Fan DP, Liu N, et al. Salient object detection via integrity learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3738–3752.

[31] Zhang Q, Ge YL, Zhang C, et al. TPRNet: Camouflaged object detection via transformer-induced progressive refinement network. The Visual Computer, 2023, 39(10): 4593–4607.

[32] Liu ZY, Zhang ZL, Tan YC, et al. Boosting camouflaged object detection with dual-task interactive transformer. Proceedings of the 26th International Conference on Pattern Recognition. Montreal: IEEE, 2022. 140–146.

引用本文

李建东,王岩,曲海成.基于Transformer的跨尺度交互学习伪装目标检测.计算机系统应用,2024,33(2):115-124

复制

文章指标

点击次数:1596
下载次数: 2027
HTML阅读次数: 1387
引用次数: 0

历史

收稿日期:2023-08-06
最后修改日期:2023-09-09
录用日期:
在线发布日期: 2023-12-18
出版日期: 2023-02-05

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码