基于融合采样和深尺约束的单目3D目标检测

doi:10.15888/j.cnki.csa.009819

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年5月2日 15:37 星期五

首页 > 过刊浏览>2025年第34卷第4期 >34-44. DOI:10.15888/j.cnki.csa.009819

PDF HTML阅读 XML下载导出引用引用提醒

基于融合采样和深尺约束的单目3D目标检测
DOI:
                        10.15888/j.cnki.csa.009819
                    
CSTR:
                        32024.14.csa.009819
                    
作者:
                        孙虎成孙虎成
青岛大学 自动化学院, 青岛 266071
在期刊界中查找
在百度中查找
在本站中查找
臧可臧可
青岛大学 自动化学院, 青岛 266071
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山东省自然科学基金 (ZR2023QF089)

Monocular 3D Object Detection Based on Fused Sampling and Depth-scale Constraints

Author:

SUN Hu-Cheng
SUN Hu-Cheng
College of Automation, Qingdao University, Qingdao 266071, China
在期刊界中查找
在百度中查找
在本站中查找
ZANG Ke
ZANG Ke
College of Automation, Qingdao University, Qingdao 266071, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [32]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

针对单目图像中不同深度目标的尺度差异所导致的单目3D目标检测算法精度不佳的问题, 提出一种基于融合采样和深尺约束的检测算法. 首先, 为增强采样特征对不同尺度目标的表征能力, 构建多尺度特征融合模块 (multi-scale fusion module, MFM), 通过分层聚合和迭代聚合对不同层级、不同尺度的特征进行融合采样, 从而提高对目标隐式尺度特征的提取能力. 此外, 构造深度尺度相关化模块 (depth-scale correlation module, DSCM), 利用深度与尺度之间的线性投影约束将不同尺度的目标补偿式放缩至同一特征水平, 以此平衡模型对不同距离目标的关注度. 基于KITTI数据集和Waymo数据集的定量结果表明, 所提出的算法相较于同类算法在多种难度下的整体平均精度AP_3D分别提升了1.56个百分点和3.07个百分点, 验证了算法的有效性及泛化性, 同时基于两类数据集的定性结果验证了该算法显著缓解了目标尺度差异对检测性能造成的影响.

关键词:3D目标检测;融合采样;多尺度特征;可变形卷积;关注度平衡;尺度放缩

Abstract:

Aiming at the poor accuracy of monocular 3D object detection algorithms caused by the scale differences of objects with different depths in monocular images, a detection algorithm based on fused sampling and depth-scale constraints is proposed. Firstly, to enhance the ability of the sampled features to represent objects at different scales, a multi-scale fusion module (MFM) is constructed. It fuses the sampled features at different levels and scales through hierarchical aggregation and iterative aggregation, thereby improving the ability to extract implicit scale features of the objects. In addition, a depth-scale correlation module (DSCM) is constructed. It uses the linear projection constraint between depth and scale for compensatory scaling of objects at different scales to the same feature level, balancing the model's focus on objects at different distances. Quantitative results based on the KITTI dataset and Waymo dataset show that for both types of datasets, the proposed algorithm improves the overall average accuracy AP_3D by 1.56 percentage points and 3.07 percentage points, respectively, compared to similar algorithms under multiple difficulties, which verifies the effectiveness and generalization of the algorithm. Meanwhile, qualitative results based on the two datasets validate that the algorithm significantly mitigates the impact of the object scale differences on detection performance.

Key words:3D object detection;fused sampling;multi-scale feature;deformable convolution;attention balancing;scale deflation

参考文献

[1] 余以春, 李明旭. 改进YOLOv5s的自动驾驶汽车目标检测. 计算机系统应用, 2023, 32(9): 97–105.

[2] Mao JG, Shi SS, Wang XG, et al. 3D object detection for autonomous driving: A comprehensive survey. International Journal of Computer Vision, 2023, 131(8): 1909–1963.

[3] 李熙莹, 叶芝桧, 韦世奎, 等. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报, 2023, 28(6): 1709–1740.

[4] Hu SM, Zhang FL, Wang M, et al. PatchNet: A patch-based image representation for interactive library-driven image editing. ACM Transactions on Graphics (TOG), 2013, 32(6): 196.

[5] Wang L, Du L, Ye XQ, et al. Depth-conditioned dynamic message propagation for monocular 3D object detection. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 454–463.

[6] Reading C, Harakeh A, Chae J, et al. Categorical depth distribution network for monocular 3D object detection. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 8555–8564.

[7] Brazil G, Pons-Moll G, Liu XM, et al. Kinematic 3D object detection in monocular video. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 135–152.

[8] Zhang YP, Lu JW, Zhou J. Objects are different: Flexible monocular 3D object detection. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 3288–3297.

[9] Huang KC, Wu TH, Su HT, et al. MonoDTR: Monocular 3D object detection with depth-aware transformer. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4002–4011.

[10] Liu XP, Xue N, Wu TF. Learning auxiliary monocular contexts helps monocular 3D object detection. Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022. 1810–1818.

[11] Lu Y, Ma XZ, Yang L, et al. Geometry uncertainty projection network for monocular 3D object detection. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 3091–3101.

[12] Yan C, Salman E. Mono3D: Open source cell library for monolithic 3-D integrated circuits. IEEE Transactions on Circuits and Systems I: Regular Papers, 2018, 65(3): 1075–1085.

[13] Kundu A, Li Y, Rehg JM. 3D-RCNN: Instance-level 3D object reconstruction via render-and-compare. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 3559–3568.

[14] Brazil G, Liu XM. M3D-RPN: Monocular 3D region proposal network for object detection. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 9286–9295.

[15] 孙延康, 王璇之, 封澳, 等. 基于多尺度融合和高阶交互的单目3D检测算法. 计算机技术与发展, 2024, 34(10): 38–45.

[16] 孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法. 计算机应用, 2024, 44(7): 2208–2215.

[17] Yu F, Wang DQ, Shelhamer E, et al. Deep layer aggregation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 2403–2412.

[18] Dai JF, Qi HZ, Xiong YW, et al. Deformable convolutional networks. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 764–773.

[19] Wu YX, He KM. Group normalization. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 3–19.

[20] Ioffe S, Szedegy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning. Lille: JMLR.org, 2015. 448–456.

[21] Tang YL, Dorn S, Savani C. Center3D: Center-based monocular 3D object detection with joint depth understanding. Proceedings of the 42nd DAGM German Conference on Pattern Recognition. Tübingen: Springer, 2021. 289–302.

[22] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the 2012 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012. 3354–3361.

[23] Sun P, Kretzschmar H, Dotiwalla X, et al. Scalability in perception for autonomous driving: Waymo open dataset. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2446–2454.

[24] Kumar A, Brazil G, Corona E, et al. DEVIANT: Depth EquiVarIAnt network for monocular 3D object detection. Proceedings of the 17th European Conference on Computer Vision. Tel Aviv: Springer, 2022. 664–683.

[25] Chong ZY, Ma XZ, Zhang H, et al. MonoDistill: Learning spatial features for monocular 3D object detection. Proceedings of the 10th International Conference on Learning Representations. OpenReview.net, 2022.

[26] Simonelli A, Bulò SR, Porzi L, et al. Are we missing confidence in pseudo-LiDAR methods for monocular 3D object detection? Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 3225–3233.

[27] Park D, Ambruş R, Guizilini V, et al. Is pseudo-lidar needed for monocular 3D object detection? Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 3142–3152.

[28] Cao YZH, Zhang H, Li YD, et al. CMAN: Leaning global structure correlation for monocular 3D object detection. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 24727–24737.

[29] Shi YG. SVDM: Single-view diffusion model for pseudo-stereo 3D object detection. arXiv:2307.02270, 2023.

[30] Liu YX, Yuan YX, Liu M. Ground-aware monocular 3D object detection for autonomous driving. IEEE Robotics and Automation Letters, 2021, 6(2): 919–926.

[31] Yao HD, Chen J, Wang Z, et al. Occlusion-aware plane-constraints for monocular 3D object detection. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(5): 4593–4605.

[32] Lian Q, Li PL, Chen XZ. MonoJSG: Joint semantic and geometric cost volume for monocular 3D object detection. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 1070–1079.

引用本文

孙虎成,臧可.基于融合采样和深尺约束的单目3D目标检测.计算机系统应用,2025,34(4):34-44

复制

文章指标

点击次数:127
下载次数: 247
HTML阅读次数: 65
引用次数: 0

历史

收稿日期:2024-09-24
最后修改日期:2024-11-07
录用日期:
在线发布日期: 2025-02-28
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码