融合上下文增强与图像频率引导的MVS方法
作者:
基金项目:

国家自然科学基金 (62073091)


MVS Method Combining Context-enhanced and Image-frequency-guide
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [37]
  • | | | |
  • 文章评论
    摘要:

    基于学习的多视图立体匹配算法目前成果显著, 但是仍然存在的卷积感受野受限以及忽略图像频率信息导致在低纹理、重复和非兰伯曲面匹配性能不足的问题, 针对以上问题提出了上下文增强与图像频率引导的多视图立体匹配网络 CAF-MVSNet. 首先, 在特征提取阶段, 将上下文增强模块融合到特征金字塔网络中, 有效地扩大网络的感受野. 然后引入了图像频率引导注意力模块, 通过编码图像的不同频率获取图像的线条、形状、纹理和颜色等信息, 增强图像的远程上下文联系的同时进一步解决低纹理、重复和非兰伯曲面的精确匹配问题, 以实现可靠的特征匹配. 在 DTU 数据集上的实验结果显示, 与经典的级联模型CasMVSNet相比综合误差(overall)提升了12.3%, 展现了优秀的性能. 此外, 在Tanks and Temples数据集上也取得了不错的效果, 展现了良好的泛化性能.

    Abstract:

    Learning-based multi-view stereo matching algorithms have achieved remarkable results, but still have the problems of limited convolutional receptive field and ignoration of image frequency information, which lead to insufficient matching performance on low-texture, repetitive, and non-Lambertian surfaces. To address these problems, this study proposes CAF-MVSNet, a context-enhanced and image-frequency-guided multi-view stereo matching network. First, the context enhancement module is fused into the feature pyramid network in the feature extraction stage to effectively expand the receptive field of the network. Then the image-frequency-guided attention module is introduced to obtain the information of lines, shapes, textures, and colors of the images by encoding different frequencies of the images, which enhances the remote contextual connection of the images and further solves the problem of accurate matching of low-texture, repetitive, and non-Lambertian surfaces for reliable feature matching. Experimental results on the DTU dataset show that CAF-MVSNet has a 12.3% improvement in the combined error compared to the classical cascade model CasMVSNet, demonstrating excellent performance. In addition, good results are achieved on the Tanks and Temples dataset, reflecting the good generalization performance of CAF-MVSNet.

    参考文献
    [1] Zhu QT, Min C, Wei ZZ, et al. Deep learning for multi-view stereo via plane sweep: A survey. arXiv:2106.15328, 2021.
    [2] Chen XZ, Ma HM, Wan J, et al. Multi-view 3D object detection network for autonomous driving. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6526–6534.
    [3] Schmid K, Hirschmüller H, Dömel A, et al. View planning for multi-view stereo 3D reconstruction using an autonomous multicopter. Journal of Intelligent & Robotic Systems, 2012, 65(1): 309–323.
    [4] Muzzupappa M, Gallo A, Spadafora F, et al. 3D reconstruction of an outdoor archaeological site through a multi-view stereo technique. Proceedings of the 2013 Digital Heritage International Congress (DigitalHeritage). Marseille: IEEE, 2013. 169–176.
    [5] Cernea D. OpenMVS: Multi-view stereo reconstruction library. https://cdcseacave.github.io/openMVS. [2024-08-25].
    [6] Zheng EL, Dunn E, Jojic V, et al. Patchmatch based joint view selection and depthmap estimation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1510–1517.
    [7] Galliani S, Lasinger K, Schindler K. Massively parallel multiview stereopsis by surface normal diffusion. Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015. 873–881.
    [8] Yao Y, Luo ZX, Li SW, et al. MVSNet: Depth inference for unstructured multi-view stereo. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 785–801.
    [9] Yao Y, Luo ZX, Li SW, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5520–5529.
    [10] Gu XD, Fan ZW, Zhu SY, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2492–2501.
    [11] Ding YK, Yuan WT, Zhu QT, et al. TransMVSNet: Global context-aware multi-view stereo network with Transformers. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 8575–8584.
    [12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017. 6000–6010.
    [13] Peng R, Wang RJ, Wang ZY, et al. Rethinking depth estimation for multi-view stereo: A unified representation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 8635–8644.
    [14] Li JL, Lu ZD, Wang YQ, et al. NR-MVSNet: Learning multi-view stereo based on normal consistency and depth refinement. IEEE Transactions on Image Processing, 2023, 32: 2649–2662.
    [15] Zhang YS, Zhu JK, Lin LX. Multi-view stereo representation revist: Region-aware mvsnet. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 17376–17385.
    [16] Ye XY, Zhao WY, Liu TQ, et al. Constraining depth map geometry for multi-view stereo: A dual-depth approach with saddle-shaped depth cells. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023. 17615–17624.
    [17] Vats VK, Joshi S, Crandall DJ, et al. GC-MVSNet: Multi-view, multi-scale, geometrically-consistent multi-view stereo. Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2024. 3230–3240.
    [18] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. Proceedings of the 4th International Conference on Learning Representations. San Juan: OpenReview.net, 2016.
    [19] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 936–944.
    [20] Cao JX, Chen Q, Guo J, et al. Attention-guided context feature pyramid network for object detection. arXiv:2005.11475, 2020.
    [21] Cooley JW, Lewis PAW, Welch PD. The fast Fourier transform and its applications. IEEE Transactions on Education, 1969, 12(1): 27–34.
    [22] Deng G, Cahill LW. An adaptive Gaussian filter for noise reduction and edge detection. Proceedings of the 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference. San Francisco: IEEE, 1993. 1615–1619.
    [23] Yang JW, Li CY, Zhang PC, et al. Focal self-attention for local-global interactions in vision Transformers. arXiv:2107.00641, 2021.
    [24] Voigtman E, Winefordner JD. Low-pass filters for signal averaging. Review of Scientific Instruments, 1986, 57(5): 957–966.
    [25] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 2999–3007.
    [26] Aanæs H, Jensen RR, Vogiatzis G, et al. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 2016, 120(2): 153–168.
    [27] Knapitsch A, Park J, Zhou QY, et al. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 2017, 36(4): 78.
    [28] Yao Y, Luo ZX, Li SW, et al. BlendedMVS: A large-scale dataset for generalized multi-view stereo networks. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 1787–1796.
    [29] Seitz SM, Curless B, Diebel J, et al. A comparison and evaluation of multi-view stereo reconstruction algorithms. Poceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE.
    [30] Wei ZZ, Zhu QT, Min C, et al. AA-RMVSNet: Adaptive aggregation recurrent multi-view stereo network. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 6167–6176.
    [31] Ma XJ, Gong Y, Wang QR, et al. Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 5712–5720.
    [32] Wang FJH, Galliani S, Vogel C, et al. PatchmatchNet: Learned multi-view patchmatch stereo. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 14189–14198.
    [33] Yang JY, Mao W, Alvarez JM, et al. Cost volume pyramid based depth inference for multi-view stereo. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 4876–4885.
    [34] Cheng S, Xu ZX, Zhu SL, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2521–2531.
    [35] Xi JH, Shi YF, Wang YJ, et al. RayMVSNet: Learning ray-based 1D implicit fields for accurate multi-view stereo. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 8585–8595.
    [36] Yan JF, Wei ZZ, Yi HW, et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 674–689.
    [37] Yang JY, Alvarez JM, Liu MM. Non-parametric depth distribution modelling based depth inference for multi-view stereo. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 8616–8624.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

陈曦,刘美,陈嘉升.融合上下文增强与图像频率引导的MVS方法.计算机系统应用,2025,34(3):259-267

复制
分享
文章指标
  • 点击次数:24
  • 下载次数: 755
  • HTML阅读次数: 2
  • 引用次数: 0
历史
  • 收稿日期:2024-09-10
  • 最后修改日期:2024-09-30
  • 在线发布日期: 2025-01-21
文章二维码
您是第10782479位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号