面向嵌入式平台多视图立体视觉深度感知

doi:10.15888/j.cnki.csa.009078

微信公众号

网站二维码

首页 > 过刊浏览>2023年第32卷第5期 >105-111. DOI:10.15888/j.cnki.csa.009078

PDF HTML阅读 XML下载导出引用引用提醒

面向嵌入式平台多视图立体视觉深度感知增强出版
DOI:
                        10.15888/j.cnki.csa.009078
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Multi-view Stereo Depth Perception for Embedded Platform

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

增强出版

文章评论

摘要:

针对目前基于神经网络的多视图立体视觉(multi-view stereo, MVS)深度估计算法存在参数量大、内存消耗严重, 难以满足当下低算力嵌入式平台的需求. 提出基于MVS2D极线注意力机制与MobileNetV3-Small的MVS深度感知网络(Mobile-MVS2D). 该网络采用编码器-解码器的结构, 使用MobileNetV3-Small网络进行编码特征提取, 对源图像与参考图像之间不同特征层的尺度信息耦合采用极线注意力机制, 解码阶段引入SE-Net与跳跃连接扩展解码特征细节, 提升预测精度. 实验结果表明, 提出的模型在ScanNet数据集中在深度图的评价指标中展现较高的精度. 在与视觉SLAM结合下可以展现出较准确的三维重建效果, 具有较好的鲁棒性. 在Jeston Xavier NX 上推理精度为Float16尺寸为640×480的图片组, 仅需0.17 s, GPU消耗仅需1 GB, 能够满足低算力嵌入式平台的需求.

Abstract:

The current multi-view stereo (MVS) depth estimation algorithms based on neural networks involve a large number of parameters and serious memory consumption, which is difficult to meet the needs of the current embedded platforms with low-computing power. Therefore, this study proposes an MVS depth perception network (Mobile-MVS2D) based on the MVS2D epipolar attention mechanism and MobileNetV3-Small. The network adopts the structure of encoder-decoder and uses MobileNetV3-Small network for encoding feature extraction. In addition, it adopts the epipolar attention mechanism for the coupling of scale information of different feature layers between the source image and the reference image and introduces SE-Net and jump connection to expand the decoding feature details in the decoding stage and improve the prediction accuracy. Experimental results show that the proposed model shows high accuracy in the evaluation index of depth maps in the ScanNet data set. By Combining with visual SLAM, the model can show a more accurate three-dimensional reconstruction effect and has excellent robustness. On the Jeston Xavier NX, the model only costs 0.17 s in inferring the image group with the accuracy of Float16 and the size of 640×480, and the GPU consumption is only 1 GB. Therefore, it can meet the needs of embedded platforms with low-computing power.

参考文献

相似文献

引证文献

引用本文

单兵,胡益民,张龙,李加东.面向嵌入式平台多视图立体视觉深度感知.计算机系统应用,2023,32(5):105-111

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-09-27
最后修改日期:2022-10-27
录用日期:
在线发布日期: 2023-03-17
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史