###

计算机系统应用英文版:2024,33(1):219-230

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

面向RGB-D语义分割的多模态任意旋转自监督学习

李鸿宇^1,2, 张宜飞^1,2, 杨东宝^1,2

(1.中国科学院信息工程研究所, 北京 100085;2.中国科学院大学网络空间安全学院, 北京 100049)

Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation

LI Hong-Yu^1,2, ZHANG Yi-Fei^1,2, YANG Dong-Bao^1,2

(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;2.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 526次下载 1152次
Received:June 29, 2023 Revised:July 27, 2023

中文摘要: 基于RGB-D数据的自监督学习受到广泛关注, 然而大多数方法侧重全局级别的表示学习, 会丢失对识别对象至关重要的局部细节信息. 由于RGB-D数据中图像和深度具有几何一致性, 因此这可以作为线索来指导RGB-D数据的自监督特征表示学习. 在本文中, 我们提出了ArbRot, 它可以无限制地旋转角度并为代理任务生成多个伪标签用于自监督学习, 而且还建立了全局和局部之间的上下文联系. 本文所提出的ArbRot可以与其他对比学习方法联合训练, 构建多模态多代理任务自监督学习框架, 以增强图像和深度视图的特征表示一致性, 从而为RGB-D语义分割任务提供有效的初始化. 在SUN RGB-D和NYU Depth Dataset V2数据集上的实验结果表明, 多模态任意旋转自监督学习得到的特征表示质量均高于基线模型. 开源代码: https://github.com/Physu/ArbRot.

中文关键词: 自监督学习代理任务对比学习 RGB-D 多模态

Abstract:Self-supervised learning on RGB-D datasets has attracted extensive attention. However, most methods focus on global-level representation learning, which tends to lose local details that are crucial for recognizing the objects. The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data. In this study, ArbRot is proposed, which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks, but also establish the relationship between global and local context. The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal, multiple pretext task self-supervised learning framework, so as to enforce feature consistency within image and depth views, thereby providing an effective initialization for RGB-D semantic segmentation. The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal, arbitrary-orientation rotation self-supervised learning is better than the baseline models.

keywords: self-supervised learning pretext task contrastive learning RGB-D multi-modal

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金面上项目(62376266); 中国科学院基础前沿科学研究计划从 0 到 1 原始创新项目(ZDBS-LY-7024)

引用文本：
李鸿宇,张宜飞,杨东宝.面向RGB-D语义分割的多模态任意旋转自监督学习.计算机系统应用,2024,33(1):219-230
LI Hong-Yu,ZHANG Yi-Fei,YANG Dong-Bao.Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation.COMPUTER SYSTEMS APPLICATIONS,2024,33(1):219-230

Author Name	Affiliation	E-mail
LI Hong-Yu	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
ZHANG Yi-Fei	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
YANG Dong-Bao	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China	yangdongbao@iie.ac.cn

Author Name	Affiliation	E-mail
LI Hong-Yu	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
ZHANG Yi-Fei	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
YANG Dong-Bao	Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China	yangdongbao@iie.ac.cn