Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation

doi:10.15888/j.cnki.csa.009362

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-25- 0

Home > Archive>Volume 33, Issue 1, 2024 >219-230. DOI:10.15888/j.cnki.csa.009362

PDF HTML XML Export Cite reminder

Self-supervised Learning Based on Multi-modal Arbitrary Rotation for RGB-D Semantic Segmentation
DOI:
                        10.15888/j.cnki.csa.009362
                    
CSTR:
                        [cstr]
                    
Author:
                        LI Hong-YuLI Hong-Yu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
ZHANG Yi-FeiZHANG Yi-Fei
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
YANG Dong-BaoYANG Dong-Bao
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Self-supervised learning on RGB-D datasets has attracted extensive attention. However, most methods focus on global-level representation learning, which tends to lose local details that are crucial for recognizing the objects. The geometric consistency between image and depth in RGB-D data can be used as a clue to guide self-supervised feature learning for the RGB-D data. In this study, ArbRot is proposed, which can not only rotate the angle without restriction and generate multiple pseudo-labels for pretext tasks, but also establish the relationship between global and local context. The ArbRot can be jointly trained with contrastive learning methods for establishing a multi-modal, multiple pretext task self-supervised learning framework, so as to enforce feature consistency within image and depth views, thereby providing an effective initialization for RGB-D semantic segmentation. The experimental results on the datasets of SUN RGB-D and NYU Depth Dataset V2 show that the quality of feature representation obtained by multi-modal, arbitrary-orientation rotation self-supervised learning is better than the baseline models.

Key words:self-supervised learning;pretext task;contrastive learning;RGB-D;multi-modal

Get Citation

李鸿宇,张宜飞,杨东宝.面向RGB-D语义分割的多模态任意旋转自监督学习.计算机系统应用,2024,33(1):219-230

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 29,2023
Revised:July 27,2023
Adopted:
Online: November 24,2023
Published: January 05,2023

Article QR Code

You are the first992183Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063