Monocular Image Depth Estimation with Adaptive Multi-scale Feature Fusion
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    In the monocular image depth estimation method based on deep learning, the depth information of the image is lost during the subsampling process of the convolutional neural networks, which leads to poor depth estimation of object edges. To solve this problem, this study presents a multi-scale feature fusion method, and an adaptive fusion strategy is adopted to dynamically adjust the fusion ratio of feature maps of different scales according to feature data to make full use of multi-scale feature information. In the monocular depth estimation task using atrous spatial pyramid pooling (ASPP), the pixel information loss affects the prediction results of small objects. When using ASPP on deep feature maps, the depth estimation result is improved by fusing rich feature information of shallow feature maps. The experimental results on the NYU-DepthV2 indoor dataset show that the method proposed in this study has a more accurate prediction of object edges and significantly improves the prediction of small objects. The root mean square error (RMSE) reaches 0.389 and the accuracy (δ<1.25) reaches 0.897, which verifies the effectiveness of the method.

    Reference
    [1] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014. 2366–2374.
    [2] Laina I, Rupprecht C, Belagiannis V, et al. Deeper depth prediction with fully convolutional residual networks. Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford: IEEE, 2016. 239–248.
    [3] Hu JJ, Ozay M, Zhang Y, et al. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa: IEEE, 2019. 1043–1051.
    [4] Chen XT, Chen XJ, Zha ZJ. Structure-aware residual pyramid network for monocular depth estimation. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao: IJCAI.org, 2019. 694–700.
    [5] Song M, Lim S, Kim W. Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381–4393.
    [6] Zhang AM, Ma YC, Liu JY, et al. Promoting monocular depth estimation by multi-scale residual Laplacian pyramid fusion. IEEE Signal Processing Letters, 2023, 30: 205–209.
    [7] Miangoleh SMH, Dille S, Mai L, et al. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 9680–9689.
    [8] Ranftl R, Bochkovskiy A, Koltun V. Vision Transformers for dense prediction. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021. 12159–12168.
    [9] Cao YZH, Wu ZF, Shen CH. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(11): 3174–3182.
    [10] Yuan WH, Gu XD, Dai ZZ, et al. Neural window fully-connected CRFs for monocular depth estimation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022. 3906–3915.
    [11] Wu JW, Zhou WJ, Luo T, et al, Multiscale multilevel context and multimodal fusion for RGB-D salient object detection. Signal Processing, 2021, 178: 107766.
    [12] Hu J, Shen L, Albanie S, et al. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011–2023.
    [13] Yang X, Chang QL, Liu XL, et al. Monocular depth estimation based on multi-scale depth map fusion. IEEE Access, 2021, 9: 67696–67705.
    [14] Xu HH, Li F. An efficient monocular depth prediction network using coordinate attention and feature fusion. Journal of Information Processing Systems, 2022, 18(6): 794–802.
    [15] Xu Y, Yu Q. Adaptive weighted multi-level fusion of multi-scale features: A new approach to pedestrian detection. Future Internet, 2021, 13(2): 38.
    [16] Lee JH, Han MK, Ko DW, et al. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326, 2021.
    [17] Wu KW, Zhang SR, Xie Z. Monocular depth prediction with residual DenseASPP network. IEEE Access, 2020, 8: 129899–129910.
    [18] 廖志伟, 金兢, 张超凡, 等. 基于分层压缩激励的ASPP网络单目深度估计. 图学学报, 2022, 43(2): 214–222.
    [19] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami: IEEE, 2009. 248–255.
    [20] Liu ST, Huang D, Wang YH. Learning spatial fusion for single-shot object detection. arXiv:1911.09516, 2019.
    [21] Chen LC, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848.
    [22] Alhashim I, Wonka P. High quality monocular depth estimation via transfer learning. arXiv:1812.11941, 2018.
    [23] Wang Z, Bovik AC, Sheikh HR, et al. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600–612.
    [24] Hao ZX, Li Y, You SD, et al. Detail preserving depth estimation from a single image using attention guided networks. Proceedings of the 2018 International Conference on 3D Vision (3DV). Verona: IEEE, 2018. 304–313.
    [25] Fu H, Gong MM, Wang CH, et al. Deep ordinal regression network for monocular depth estimation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City: IEEE, 2018. 2002–2011.
    [26] Xu XF, Chen Z, Yin FL. Monocular depth estimation with multi-scale feature fusion. IEEE Signal Processing Letters, 2021, 28: 678–682.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陈国军,付云鹏,于丽香,崔涛.自适应多尺度特征融合的单目图像深度估计.计算机系统应用,2024,33(7):121-128

Copy
Share
Article Metrics
  • Abstract:364
  • PDF: 1254
  • HTML: 691
  • Cited by: 0
History
  • Received:November 18,2023
  • Revised:December 20,2023
  • Online: June 05,2024
Article QR Code
You are the first992267Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063