Lightweight Self-supervised Monocular Depth Estimation
CSTR:
Author:
  • LIU Jia

    LIU Jia

    School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;Jiangsu Province Engineering Research Center of Intelligent Meteorological Exploration Robot, Nanjing 210044, China;Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China
    Find this author on All Journals
    Find this author on BaiDu
    Search for this author on this site
  • LIN Xiao

    LIN Xiao

    School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;Jiangsu Province Engineering Research Center of Intelligent Meteorological Exploration Robot, Nanjing 210044, China;Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China
    Find this author on All Journals
    Find this author on BaiDu
    Search for this author on this site
  • CHEN Da-Peng

    CHEN Da-Peng

    School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;Jiangsu Province Engineering Research Center of Intelligent Meteorological Exploration Robot, Nanjing 210044, China;Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China
    Find this author on All Journals
    Find this author on BaiDu
    Search for this author on this site
  • XU Chuang

    XU Chuang

    School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;Jiangsu Province Engineering Research Center of Intelligent Meteorological Exploration Robot, Nanjing 210044, China;Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China
    Find this author on All Journals
    Find this author on BaiDu
    Search for this author on this site
  • SHI Hao

    SHI Hao

    School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;Jiangsu Province Engineering Research Center of Intelligent Meteorological Exploration Robot, Nanjing 210044, China;Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210044, China
    Find this author on All Journals
    Find this author on BaiDu
    Search for this author on this site
  • Article
  • | |
  • Metrics
  • |
  • Reference [40]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Currently, most augmented reality and autonomous driving applications use not only the depth information estimated by the depth network but also the pose information estimated by the pose network. Integrating both the pose network and the depth network into an embedded device can be extremely memory-consuming. In view of this problem, a method of the depth and pose networks sharing feature extractors is proposed to keep the model at a lightweight size. In addition, the depth-separable convolutional lightweight depth network with linear structure allows the network to obtain fewer parameters without losing too much detailed information. Finally, experiments on the KITTI dataset show that compared with the algorithms of the same type, the size of the pose and deep network parameters is only 35.33 MB. At the same time, the average absolute error of the restored depth map is also maintained at 0.129.

    Reference
    [1] 刘万奎, 刘越. 用于增强现实的光照估计研究综述. 计算机辅助设计与图形学学报, 2016, 28(2): 197–207. [doi: 10.3969/j.issn.1003-9775.2016.02.001
    [2] Hou YL, Peng JW, Hu ZH, et al. Planarity constrained multi-view depth map reconstruction for urban scenes. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 139: 133–145. [doi: 10.1016/j.isprsjprs.2018.03.003
    [3] Mostegel C, Fraundorfer F, Bischof H. Prioritized multi-view stereo depth map generation using confidence prediction. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 143: 167–180. [doi: 10.1016/j.isprsjprs.2018.03.022
    [4] Zeller N, Quint F, Stilla U. Depth estimation and camera calibration of a focused plenoptic camera for visual odometry. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 118: 83–100. [doi: 10.1016/j.isprsjprs.2016.04.010
    [5] 公冶佳楠, 李轲. 基于光场图像序列的自适应权值块匹配深度估计算法. 计算机系统应用, 2020, 29(4): 195–201. [doi: 10.15888/j.cnki.csa.007387
    [6] 江俊君, 李震宇, 刘贤明. 基于深度学习的单目深度估计方法综述. 计算机学报, 2022, 45(6): 1276–1307. [doi: 10.11897/SP.J.1016.2022.01276
    [7] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: ACM, 2014. 2366–2374.
    [8] Fu H, Gong MM, Wang CH, et al. Deep ordinal regression network for monocular depth estimation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 2002–2011.
    [9] Garg R, B. G. VK, Carneiro G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 740–756.
    [10] Zhou TH, Brown M, Snavely N, et al. Unsupervised learning of depth and ego-motion from video. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6612–6619.
    [11] Zhan HY, Garg R, Weerasekera CS, et al. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 340–349.
    [12] Dosovitskiy A, Fischer P, Ilg E, et al. FlowNet: Learning optical flow with convolutional networks. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 2758–2766.
    [13] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 2013, 32(11): 1231–1237. [doi: 10.1177/0278364913491297
    [14] Mahjourian R, Wicke M, Angelova A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5667–5675.
    [15] Luo CX, Yang ZH, Wang P, et al. Every pixel counts++: Joint learning of geometry and motion with 3D holistic understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2624–2641. [doi: 10.1109/TPAMI.2019.2930258
    [16] Ranjan A, Jampani V, Balles L, et al. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 12232–12241.
    [17] Gordon A, Li HH, Jonschkowski R, et al. Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 8976–8985.
    [18] Godard C, Mac Aodha O, Firman M, et al. Digging into self-supervised monocular depth estimation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 3827–3837.
    [19] Bian JW, Zhan HY, Wang NY, et al. Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 2021, 129(9): 2548–2564. [doi: 10.1007/s11263-021-01484-6
    [20] Guizilini V, Ambruș R, Chen D, et al. Multi-frame self-supervised depth with transformers. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 160–170.
    [21] Shu C, Yu K, Duan Z X, et al. Feature-metric loss for self-supervised learning of depth and egomotion. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 572–588.
    [22] Watson J, Mac Aodha O, Prisacariu V, et al. The temporal opportunist: Self-supervised multi-frame monocular depth. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 1164–1174.
    [23] Laina I, Rupprecht C, Belagiannis V, et al. Deeper depth prediction with fully convolutional residual networks. Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford: IEEE, 2016. 239–248.
    [24] Liu FY, Shen CH, Lin GS. Deep convolutional neural fields for depth estimation from a single image. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 5162–5170.
    [25] Li B, Shen CH, Dai YC, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1119–1127.
    [26] Bhat SF, Alhashim I, Wonka P. AdaBins: Depth estimation using adaptive bins. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 4008–4017.
    [27] Godard C, Mac Aodha O, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6602–6611.
    [28] Wang Z, Bovik AC, Sheikh HR, et al. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600–612. [doi: 10.1109/TIP.2003.819861
    [29] Poggi M, Tosi F, Mattoccia S. Learning monocular depth estimation with unsupervised trinocular assumptions. Proceedings of the 2018 International Conference on 3D vision (3DV). Verona: IEEE, 2018. 324–333.
    [30] 马成齐, 李学华, 张兰杰, 等. 抗遮挡的单目深度估计算法. 计算机工程与应用, 2021, 57(2): 217–222. [doi: 10.3778/j.issn.1002-8331.1911-0346
    [31] Poggi M, Aleotti F, Tosi F, et al. Towards real-time unsupervised monocular depth estimation on CPU. Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid: IEEE, 2018. 5848–5854.
    [32] Liu J, Li Q, Cao R, et al. MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 166: 255–267. [doi: 10.1016/j.isprsjprs.2020.06.004
    [33] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: 2016. 770–778.
    [34] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    [35] Han K, Wang YH, Tian Q, et al. Ghostnet: More features from cheap operations. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 1577–1586.
    [36] Pilzer A, Lathuilière S, Sebe N, et al. Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 9760–9769.
    [37] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017.
    [38] Kingma DP, Ba LJ. Adam: A method for stochastic optimization. Proceedings of the 2015 International Conference on Learning Representations. San Diego: ICLR, 2015.
    [39] Guizilini V, Hou R, Li J, et al. Semantically-guided representation learning for self-supervised monocular depth. Proceedings of the 8th International Conference on Learning Representations. Addis Ababa: ICLR, 2020.
    [40] Zhao W, Liu SH, Shu YZ, et al. Towards better generalization: Joint depth-pose learning without PoseNet. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 9148–9158.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘佳,林潇,陈大鹏,徐闯,石豪.轻量化自监督单目深度估计.计算机系统应用,2023,32(8):116-125

Copy
Share
Article Metrics
  • Abstract:723
  • PDF: 2140
  • HTML: 896
  • Cited by: 0
History
  • Received:February 03,2023
  • Revised:March 01,2023
  • Online: May 19,2023
Article QR Code
You are the first990578Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063