###
计算机系统应用英文版:2024,33(1):219-230
本文二维码信息
码上扫一扫!
面向RGB-D语义分割的多模态任意旋转自监督学习
(1.中国科学院 信息工程研究所, 北京 100085;2.中国科学院大学 网络空间安全学院, 北京 100049)
Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation
(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;2.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 526次   下载 1152
Received:June 29, 2023    Revised:July 27, 2023
中文摘要: 基于RGB-D数据的自监督学习受到广泛关注, 然而大多数方法侧重全局级别的表示学习, 会丢失对识别对象至关重要的局部细节信息. 由于RGB-D数据中图像和深度具有几何一致性, 因此这可以作为线索来指导RGB-D数据的自监督特征表示学习. 在本文中, 我们提出了ArbRot, 它可以无限制地旋转角度并为代理任务生成多个伪标签用于自监督学习, 而且还建立了全局和局部之间的上下文联系. 本文所提出的ArbRot可以与其他对比学习方法联合训练, 构建多模态多代理任务自监督学习框架, 以增强图像和深度视图的特征表示一致性, 从而为RGB-D语义分割任务提供有效的初始化. 在SUN RGB-D和NYU Depth Dataset V2数据集上的实验结果表明, 多模态任意旋转自监督学习得到的特征表示质量均高于基线模型. 开源代码: https://github.com/Physu/ArbRot.
Abstract:Self-supervised learning on RGB-D datasets has attracted extensive attention. However, most methods focus on global-level representation learning, which tends to lose local details that are crucial for recognizing the objects. The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data. In this study, ArbRot is proposed, which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks, but also establish the relationship between global and local context. The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal, multiple pretext task self-supervised learning framework, so as to enforce feature consistency within image and depth views, thereby providing an effective initialization for RGB-D semantic segmentation. The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal, arbitrary-orientation rotation self-supervised learning is better than the baseline models.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金面上项目(62376266); 中国科学院基础前沿科学研究计划从 0 到 1 原始创新项目(ZDBS-LY-7024)
引用文本:
李鸿宇,张宜飞,杨东宝.面向RGB-D语义分割的多模态任意旋转自监督学习.计算机系统应用,2024,33(1):219-230
LI Hong-Yu,ZHANG Yi-Fei,YANG Dong-Bao.Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation.COMPUTER SYSTEMS APPLICATIONS,2024,33(1):219-230