Transformer-based Cross Scale Interactive Learning for Camouflage Object Detection

doi:10.15888/j.cnki.csa.009395

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-24- 21

Home > Archive>Volume 33, Issue 2, 2024 >115-124. DOI:10.15888/j.cnki.csa.009395

PDF HTML XML Export Cite reminder

Transformer-based Cross Scale Interactive Learning for Camouflage Object Detection
DOI:
                        10.15888/j.cnki.csa.009395
                    
CSTR:
                        [cstr]
                    
Author:
                        LI Jian-DongLI Jian-Dong
School of Software, Liaoning Technical University, Huludao 125105, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WANG YanWANG Yan
School of Software, Liaoning Technical University, Huludao 125105, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
QU Hai-ChengQU Hai-Cheng
School of Software, Liaoning Technical University, Huludao 125105, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [32]

Related [20]

Cited by

Materials

Comments

Abstract:

Camouflage object detection (COD) aims to accurately and efficiently detect camouflaged objects that are highly similar to the background. Its method can assist in species protection, medical patient detection, and military monitoring, possessing high practical value. In recent years, using deep learning methods to detect camouflaged objects has become an emerging research direction. However, most existing COD algorithms apply a convolutional neural network (CNN) as the feature extraction network and ignore the influence of feature representation and fusion methods on detection performance when combining multi-level features. As the camouflage object detection model based on CNN has a weak ability to extract the global features of the detected object, this study proposes a cross scale interactive learning method for camouflage object detection based on Transformer. The model first puts forward a dual branch feature fusion module, which fuses features that have undergone iterative attention to better fuse high- and low-level features. Secondly, a multi-scale global context information module is introduced to fully integrate context information to enhance features. Finally, a multi-channel pooling module is proposed, which can focus on the local information of the detected object and improve the accuracy of camouflage target detection. The experimental results on the CHAMELEON, CAMO, and COD10K datasets show that this method generates clearer prediction maps and can achieve higher accuracy in camouflage object detection models than current mainstream camouflage object detection algorithms.

Key words:deep learning;camouflage object detection (COD);visual characteristic pyramid;convolutional neural network (CNN);feature fusion

Reference

[1] Zhang X, Zhu C, Wang S, et al. A Bayesian approach to camouflaged moving object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(9): 2001–2013.

[2] Beiderman Y, Teicher M, Garcia J, et al. Optical technique for classification, recognition and identification of obscured objects. Optics Communications, 2010, 283(21): 4274–4282.

[3] Galun M, Sharon E, Basri R, et al. Texture segmentation by multiscale aggregation of filter responses and shape elements. Proceedings of the 9th IEEE International Conference on Computer Vision. Nice: IEEE, 2003. 716–723.

[4] Guo HX, Dou YL, Tian T, et al. A robust foreground segmentation method by temporal averaging multiple video frames. Proceedings of the 2008 International Conference on Audio, Language and Image Processing. Shanghai: IEEE, 2008. 878–882.

[5] Chaduvula K, Rao BP, Govardhan A. An efficient content based image retrieval using color and texture of image sub- blocks. International Journal of Engineering Science and Technology, 2020, 3(2): 1060–1068.

[6] Hall JR, Cuthill IC, Baddeley R, et al. Camouflage, detection and identification of moving targets. Proceedings of the Royal Society B: Biological Sciences, 2013, 280(1758): 20130064.

[7] Sun YJ, Chen G, Zhou T, et al. Context-aware cross-level fusion network for camouflaged object detection. Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal: IJCAI.org, 2021. 1025–1031.

[8] Mei HY, Ji GP, Wei ZQ, et al. Camouflaged object segmentation with distraction mining. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8772–8781.

[9] Fan DP, Ji GP, Cheng MM, et al. Concealed object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6024–6042.

[10] Dai YM, Gieseke F, Oehmcke S, et al. Attentional feature fusion. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021. 3560–3569.

[11] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.

[12] Wang WH, Xie EZ, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 548–558.

[13] Wang WH, Xie EZ, Li X, et al. PVT v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 2022, 8(3): 415–424.

[14] Liu ST, Huang D, Wang YH. Receptive field block net for accurate and fast object detection. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 385–400.

[15] He KM, Zhang XY, Ren SQ, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916.

[16] Fan DP, Ji GP, Sun GL, et al. Camouflaged object detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2774–2784.

[17] Wu Z, Su L, Huang QM. Cascaded partial decoder for fast and accurate salient object detection. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3907–3916.

[18] Wang K, Bi HB, Zhang Y, et al. D²C-Net: A dual-branch, dual-guidance and cross-refine network for camouflaged object detection. IEEE Transactions on Industrial Electronics, 2022, 69(5): 5364–5374.

[19] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.

[20] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848.

[21] Li WJ, Zhang ZY, Wang XJ, et al. AdaX: Adaptive gradient descent with exponential long term memory. arXiv:2004.09740, 2020.

[22] Fan DP, Cheng MM, Liu Y, et al. Structure-measure: A new way to evaluate foreground maps. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 4548–4557.

[23] Fan DP, Gong C, Gao Y, et al. Enhanced-alignment measure for binary foreground map evaluation. Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI.org, 2018. 698–704.

[24] Zhao JX, Liu JJ, Fan DP, et al. EGNet: Edge guidance network for salient object detection. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 8779–8788.

[25] Fan DP, Ji GP, Zhou T, et al. PraNet: Parallel reverse attention network for polyp segmentation. Proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima: Springer, 2020. 263–273.

[26] Zhong YJ, Li B, Tang L, et al. Detecting camouflaged object in frequency domain. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4504–4513.

[27] Yang F, Zhai Q, Li X, et al. Uncertainty-guided transformer reasoning for camouflaged object detection. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 4146–4155.

[28] Wang HW, Wang XZ, Sun FC, et al. Camouflaged object segmentation with transformer. Proceedings of the 6th International Conference on Cognitive Systems and Signal Processing. Suzhou: Springer, 2022. 225–237.

[29] Liu N, Zhang N, Wan KY, et al. Visual saliency transformer. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 4722–4732.

[30] Zhuge MC, Fan DP, Liu N, et al. Salient object detection via integrity learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3738–3752.

[31] Zhang Q, Ge YL, Zhang C, et al. TPRNet: Camouflaged object detection via transformer-induced progressive refinement network. The Visual Computer, 2023, 39(10): 4593–4607.

[32] Liu ZY, Zhang ZL, Tan YC, et al. Boosting camouflaged object detection with dual-task interactive transformer. Proceedings of the 26th International Conference on Pattern Recognition. Montreal: IEEE, 2022. 140–146.

Get Citation

李建东,王岩,曲海成.基于Transformer的跨尺度交互学习伪装目标检测.计算机系统应用,2024,33(2):115-124

Copy

Article Metrics

Abstract:1322
PDF: 1854
HTML: 1088
Cited by: 0

History

Received:August 06,2023
Revised:September 09,2023
Adopted:
Online: December 18,2023
Published: February 05,2023

Article QR Code

You are the first992158Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063