面向自动驾驶的高效视图转换
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

福厦泉国家自主创新示范区协同创新平台项目(2023FX0002)


Efficient View Transformation for Autonomous Driving
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在自动驾驶技术的领域中, 利用鸟瞰图(bird’s eye view, BEV)进行3D目标检测任务已经引起了广泛的关注. 针对现有相机至鸟瞰视图转换方法, 实时性不足、部署复杂度较高的难题, 提出了一种简单高效、无需任何特殊工程操作即可部署的视图转换方法. 首先, 针对完整图像特征存在大量冗余信息, 引入宽度特征提取器并辅以单目3D检测任务, 提炼图像的关键特征, 确保过程中信息损失的最小化; 其次, 提出一种特征引导的极坐标位置编码方法, 增强相机视角与鸟瞰图表示之间的映射关系与模型空间理解能力; 最后, 通过单层交叉注意力机制实现可学习BEV嵌入与宽度图像特征的交互, 从而生成高质量的BEV特征. 实验结果表明: 在nuScenes验证集上该网络架构与LSS (lift, splat, shoot)相比mAP从29.5%提升到32.0%, 提升了8.5%, NDS从37.1%提升到38.0%, 提升了2.4%, 表明该模型在自动驾驶场景下的3D目标检测任务的有效性. 同时相比于LSS在延迟上降低了41.12 %.

    Abstract:

    In autonomous driving, the task of using bird’s eye view (BEV) for 3D object detection has attracted significant attention. Existing camera-to-BEV transformation methods are facing challenges of insufficient real-time performance and high deployment complexity. To address these issues, this study proposes a simple and efficient view transformation method that can be deployed without any special engineering operations. First, to address the redundancy in complete image features, a width feature extractor is introduced and supplemented by a monocular 3D detection task to refine the key features of the image. In this way, the minimal information loss in the process can be ensured. Second, a feature-guided polar coordinate positional encoding method is proposed to enhance the mapping relationship between the camera view and the BEV representation, as well as the spatial understanding of the model. Lastly, the study has achieved the interaction between learnable BEV embeddings and width image features through a single-layer cross-attention mechanism, thus generating high-quality BEV features. Experimental results show that, compared to lift, splat, shoot (LSS), on the nuScenes validation set, this network structure improves mAP from 29.5% to 32.0%, an increase of 8.5%, and NDS from 37.1% to 38.0%, an increase of 2.4%. This demonstrates the effectiveness of the model in 3D object detection tasks in autonomous driving scenarios. Additionally, compared to LSS, it reduces latency by 41.12%.

    参考文献
    相似文献
    引证文献
引用本文

刘家辉,官敬超,方鸿清,巢建树.面向自动驾驶的高效视图转换.计算机系统应用,,():1-9

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-12
  • 最后修改日期:2024-08-01
  • 录用日期:
  • 在线发布日期: 2024-11-15
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号