Abstract:The current multi-view stereo (MVS) depth estimation algorithms based on neural networks involve a large number of parameters and serious memory consumption, which is difficult to meet the needs of the current embedded platforms with low-computing power. Therefore, this study proposes an MVS depth perception network (Mobile-MVS2D) based on the MVS2D epipolar attention mechanism and MobileNetV3-Small. The network adopts the structure of encoder-decoder and uses MobileNetV3-Small network for encoding feature extraction. In addition, it adopts the epipolar attention mechanism for the coupling of scale information of different feature layers between the source image and the reference image and introduces SE-Net and jump connection to expand the decoding feature details in the decoding stage and improve the prediction accuracy. Experimental results show that the proposed model shows high accuracy in the evaluation index of depth maps in the ScanNet data set. By Combining with visual SLAM, the model can show a more accurate three-dimensional reconstruction effect and has excellent robustness. On the Jeston Xavier NX, the model only costs 0.17 s in inferring the image group with the accuracy of Float16 and the size of 640×480, and the GPU consumption is only 1 GB. Therefore, it can meet the needs of embedded platforms with low-computing power.