Abstract:Monocular depth estimation is a fundamental problem in computer vision, and the patch-match and plane-regularization network (P2Net) is one of the most advanced unsupervised monocular depth estimation methods. As the nearest neighbor interpolation algorithm, the upsampling method adopted by the depth prediction network of P2Net, has a relatively simple calculation process, the predicted depth maps have a poor generation quality. Therefore, the residual upsampling structure based on multiple upsampling algorithms is constructed in this study to replace the upsampling layer of the original network for more feature information and higher integrity of the object structure. The experimental results on the NYU-Depth V2 dataset reveal that compared with the original network, the improved P2Net based on the transposed convolution, bilinear interpolation, and PixelShuffle can reduce the root mean square error (RMSE) by 2.25%, 2.73%, and 3.05%, respectively. The residual upsampling structure in this study improves the generation quality of the predicted depth maps and reduces the prediction error.