本文已被:浏览 554次 下载 1556次
Received:September 27, 2022 Revised:October 27, 2022
Received:September 27, 2022 Revised:October 27, 2022
中文摘要: 针对目前基于神经网络的多视图立体视觉(multi-view stereo, MVS)深度估计算法存在参数量大、内存消耗严重, 难以满足当下低算力嵌入式平台的需求. 提出基于MVS2D极线注意力机制与MobileNetV3-Small的MVS深度感知网络(Mobile-MVS2D). 该网络采用编码器-解码器的结构, 使用MobileNetV3-Small网络进行编码特征提取, 对源图像与参考图像之间不同特征层的尺度信息耦合采用极线注意力机制, 解码阶段引入SE-Net与跳跃连接扩展解码特征细节, 提升预测精度. 实验结果表明, 提出的模型在ScanNet数据集中在深度图的评价指标中展现较高的精度. 在与视觉SLAM结合下可以展现出较准确的三维重建效果, 具有较好的鲁棒性. 在Jeston Xavier NX 上推理精度为Float16尺寸为640×480的图片组, 仅需0.17 s, GPU消耗仅需1 GB, 能够满足低算力嵌入式平台的需求.
Abstract:The current multi-view stereo (MVS) depth estimation algorithms based on neural networks involve a large number of parameters and serious memory consumption, which is difficult to meet the needs of the current embedded platforms with low-computing power. Therefore, this study proposes an MVS depth perception network (Mobile-MVS2D) based on the MVS2D epipolar attention mechanism and MobileNetV3-Small. The network adopts the structure of encoder-decoder and uses MobileNetV3-Small network for encoding feature extraction. In addition, it adopts the epipolar attention mechanism for the coupling of scale information of different feature layers between the source image and the reference image and introduces SE-Net and jump connection to expand the decoding feature details in the decoding stage and improve the prediction accuracy. Experimental results show that the proposed model shows high accuracy in the evaluation index of depth maps in the ScanNet data set. By Combining with visual SLAM, the model can show a more accurate three-dimensional reconstruction effect and has excellent robustness. On the Jeston Xavier NX, the model only costs 0.17 s in inferring the image group with the accuracy of Float16 and the size of 640×480, and the GPU consumption is only 1 GB. Therefore, it can meet the needs of embedded platforms with low-computing power.
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
单兵,胡益民,张龙,李加东.面向嵌入式平台多视图立体视觉深度感知.计算机系统应用,2023,32(5):105-111
SHAN Bing,HU Yi-Min,ZHANG Long,LI Jia-Dong.Multi-view Stereo Depth Perception for Embedded Platform.COMPUTER SYSTEMS APPLICATIONS,2023,32(5):105-111
单兵,胡益民,张龙,李加东.面向嵌入式平台多视图立体视觉深度感知.计算机系统应用,2023,32(5):105-111
SHAN Bing,HU Yi-Min,ZHANG Long,LI Jia-Dong.Multi-view Stereo Depth Perception for Embedded Platform.COMPUTER SYSTEMS APPLICATIONS,2023,32(5):105-111