面向RGB-D语义分割的多模态任意旋转自监督学习

doi:10.15888/j.cnki.csa.009362

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月7日 21:27 星期一

首页 > 过刊浏览>2024年第33卷第1期 >219-230. DOI:10.15888/j.cnki.csa.009362

PDF HTML阅读 XML下载导出引用引用提醒

面向RGB-D语义分割的多模态任意旋转自监督学习
DOI:
                        10.15888/j.cnki.csa.009362
                    
CSTR:
                        32024.14.csa.009362
                    
作者:
                        李鸿宇李鸿宇
中国科学院 信息工程研究所, 北京 100085;中国科学院大学 网络空间安全学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
张宜飞张宜飞
中国科学院 信息工程研究所, 北京 100085;中国科学院大学 网络空间安全学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
杨东宝杨东宝
中国科学院 信息工程研究所, 北京 100085;中国科学院大学 网络空间安全学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金面上项目(62376266); 中国科学院基础前沿科学研究计划从 0 到 1 原始创新项目(ZDBS-LY-7024)

Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation

Author:

LI Hong-Yu
LI Hong-Yu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yi-Fei
ZHANG Yi-Fei
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Dong-Bao
YANG Dong-Bao
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [33]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

基于RGB-D数据的自监督学习受到广泛关注, 然而大多数方法侧重全局级别的表示学习, 会丢失对识别对象至关重要的局部细节信息. 由于RGB-D数据中图像和深度具有几何一致性, 因此这可以作为线索来指导RGB-D数据的自监督特征表示学习. 在本文中, 我们提出了ArbRot, 它可以无限制地旋转角度并为代理任务生成多个伪标签用于自监督学习, 而且还建立了全局和局部之间的上下文联系. 本文所提出的ArbRot可以与其他对比学习方法联合训练, 构建多模态多代理任务自监督学习框架, 以增强图像和深度视图的特征表示一致性, 从而为RGB-D语义分割任务提供有效的初始化. 在SUN RGB-D和NYU Depth Dataset V2数据集上的实验结果表明, 多模态任意旋转自监督学习得到的特征表示质量均高于基线模型. 开源代码: https://github.com/Physu/ArbRot.

关键词:自监督学习;代理任务;对比学习;RGB-D;多模态

Abstract:

Self-supervised learning on RGB-D datasets has attracted extensive attention. However, most methods focus on global-level representation learning, which tends to lose local details that are crucial for recognizing the objects. The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data. In this study, ArbRot is proposed, which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks, but also establish the relationship between global and local context. The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal, multiple pretext task self-supervised learning framework, so as to enforce feature consistency within image and depth views, thereby providing an effective initialization for RGB-D semantic segmentation. The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal, arbitrary-orientation rotation self-supervised learning is better than the baseline models.

Key words:self-supervised learning;pretext task;contrastive learning;RGB-D;multi-modal

参考文献

[1] Lopes A, Souza R, Pedrini H. A survey on RGB-D datasets. Computer Vision and Image Understanding, 2022, 222: 103489.

[2] Cao JM, Leng HC, Lischinski D, et al. ShapeConv: Shape-aware convolutional layer for indoor RGB-D semantic segmentation. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 7068–7077.

[3] 李梦怡, 朱定局. 基于全卷积网络的图像语义分割方法综述. 计算机系统应用, 2021, 30(9): 41–52.

[4] Zhao XQ, Zhang LH, Pang YW, et al. A single stream network for robust and real-time RGB-D salient object detection. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 646–662.

[5] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255.

[6] Pathak D, Krähenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2536–2544.

[7] Noroozi M, Favaro P. Unsupervised learning of visual representations by solving jigsaw puzzles. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 69–84.

[8] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning. Helsinki: ACM, 2008. 1096–1103.

[9] Larsson G, Maire M, Shakhnarovich G. Colorization as a proxy task for visual understanding. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 840–849.

[10] Zhao XQ, Pang YW, Zhang LH, et al. Self-supervised pretraining for RGB-D salient object detection. Proceedings of the 36th AAAI Conference on Artificial Intelligence. AAAI, 2022. 3463–3471.

[11] Chen YJ, Nießner M, Dai A. 4DContrast: Contrastive learning with dynamic correspondences for 3D scene understanding. Proceedings of the 17th European Conference on Computer Vision. Tel Aviv: Springer, 2022. 543–560.

[12] Yang JG, Guo S, Wu GS, et al. CoMAE: Single model hybrid pre-training on small-scale RGB-D datasets. Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023. 3145–3154.

[13] Gidaris S, Singh P, Komodakis N. Unsupervised representation learning by predicting image rotations. Proceedings of the 6th International Conference on Learning Representations. Vancouver: OpenReview.net, 2018.

[14] Misra I, van der Maaten L. Self-supervised learning of pretext-invariant representations. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 6706–6716.

[15] Chen PG, Liu S, Jia JY. Jigsaw clustering for unsupervised visual representation learning. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 11521–11530.

[16] Jing LL, Tian YL. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 4037–4058.

[17] Hendrycks D, Mazeika M, Kadavath S, et al. Using self-supervised learning can improve model robustness and uncertainty. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 1403.

[18] Chen T, Zhai XH, Ritter M, et al. Self-supervised GANs via auxiliary rotation loss. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 12146–12155.

[19] Reed CJ, Metzger S, Srinivas A, et al. SelfAugment: Automatic augmentation policies for self-supervised learning. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 2673–2682.

[20] He KM, Fan HQ, Wu YX, et al. Momentum contrast for unsupervised visual representation learning. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 9726–9735.

[21] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR, 2020. 1597–1607.

[22] Grill JB, Strub F, Altché F, et al. Bootstrap your own latent a new approach to self-supervised learning. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1786.

[23] Chen XL, He KM. Exploring simple siamese representation learning. Proceedings of the 2021 IEEE/CVF Conference on Compute獲攠杖浩敳湩瑯慮琠楡潮湤??慡牴塴楥癲?ㄠ?づ??で??????㈠〨?????戮爠?ashville: IEEE, 2021. 15745–15753.

[24] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention (MICCAI). Munich: Springer, 2015. 234–241.

[25] Yang ZT, Sun YN, Liu S, et al. 3DSSD: Point-based 3D single stage object detector. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 11037–11045.

[26] Krizhevsky A. Learning multiple layers of features from tiny images [Master’s Thesis]. Toronto: University of Toronto, 2009.

[27] Li JY, Wang N, Zhang LF, et al. Recurrent feature reasoning for image inpainting. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 7757–7765.

[28] Song SR, Lichtenberg SP, Xiao JX. SUN RGB-D: A RGB-D scene understanding benchmark suite. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 567–576.

[29] Silberman N, Hoiem D, Kohli P, et al. Indoor segmentation and support inference from RGB-D images. Proceedings of the 12th European Conference on Computer Vision (ECCV). Florence: Springer, 2012. 746–760.

[30] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016. 770–778.

[31] Mishra P, Sarawadekar K. Polynomial learning rate policy with warm restart for deep neural network. Proceedings of the 2019 IEEE Region 10 Conference (TENCON). Kochi: IEEE, 2019. 2087–2092.

[32] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 3431–3440.

[33] Chen LC, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image

引用本文

李鸿宇,张宜飞,杨东宝.面向RGB-D语义分割的多模态任意旋转自监督学习.计算机系统应用,2024,33(1):219-230

复制

文章指标

点击次数:633
下载次数: 1361
HTML阅读次数: 972
引用次数: 0

历史

收稿日期:2023-06-29
最后修改日期:2023-07-27
录用日期:
在线发布日期: 2023-11-24
出版日期: 2023-01-05

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码