基于神经辐射场的多人多目3D人体姿态估计
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金 (62272235)


Multi-person Multi-view 3D Human Pose Estimation Based on Neural Radiance Field
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多人多目3D人体姿态估计任务旨在从多目视图中预测多个人体关键点的位置, 是计算机视觉中的基本问题. 深度的缺失和3D计算的巨大成本, 导致从RGB图像中估计多人的姿态变得复杂且不准确, 为此研究人员提出了多种有效的解决方法. 其中, 基于体素表示的方法利用相机参数从多目视图中获取3D体素特征, 但因为体素的离散设计, 不可避免地带来量化误差. 针对这一问题, 本文提出了基于神经辐射场(neural radiance field, NeRF)的多人多目3D人体姿态估计方法PoseNeRF, 该方法首次以双分支联合训练的方式, 将NeRF结构以端到端可微的形式嵌入到多人多目3D人体姿态估计中. PoseNeRF包含NeRF分支、Pose分支及共享参数机制. 具体而言, NeRF分支利用多目图像的增强特征训练NeRF分支网络, 使其中的几何多层感知机(geometric multi-layer perceptron, G-MLP)能够表示具体位置的不透明度; Pose分支从3D体素特征中预测出人体中心位置和关键点位置; 共享参数机制利用NeRF分支共享的G-MLP提供不透明度, 优化3D体素特征. 为了验证方法的有效性, 本文在CMU Panoptic、Campus、Shelf数据集上进行了大量实验. 结果表明, 本方法在CMU Panoptic数据集上, AP25指标相较VoxelPose和Faster VoxelPose有明显提升, 分别为2.1%和6.0%; 在MPJPE中误差为1.4 mm, 小于Faster VoxelPose; 在Campus数据集、Shelf数据集上也相较VoxelPose有一定提升.

    Abstract:

    Multi-person multi-view 3D human pose estimation aims to predict the positions of multiple human keypoints from multi-view images and is considered a fundamental problem in computer vision. The lack of depth information and the high computational cost of 3D processing make pose estimation from RGB images complex and prone to inaccuracies. To address these challenges, various effective solutions have been proposed. Among them, voxel-based methods utilize camera parameters to extract 3D voxel features from multi-view images. However, the discrete nature of voxel representation inevitably introduces quantization errors. To mitigate this limitation, a novel method named PoseNeRF is proposed, which integrates neural radiance fields (NeRF) into multi-person multi-view 3D human pose estimation through an end-to-end differentiable dual-branch joint training framework. The proposed method consists of three main components: the NeRF branch, the pose branch, and a shared parameter mechanism. Specifically, the NeRF branch is trained using enhanced multi-view image features, allowing the embedded geometric multilayer perceptrons (G-MLP) to represent the opacity of specific spatial locations. The pose branch predicts the central positions and keypoints of humans from 3D voxel features. The shared parameter mechanism leverages the opacity information provided by the G-MLP in the NeRF branch to refine the voxel feature representation. The effectiveness of the proposed method is validated through extensive experiments on the CMU Panoptic, Campus, and Shelf datasets. On the CMU Panoptic dataset, the proposed method achieves notable improvements over VoxelPose and Faster VoxelPose in the AP25 metric, with gains of 2.1% and 6.0%, respectively. In terms of MPJPE, the proposed method reduces the error by 1.4 mm compared to Faster VoxelPose. Consistent performance gains are also observed on the Campus and Shelf datasets compared to VoxelPose.

    参考文献
    相似文献
    引证文献
引用本文

邹杰,林皓月.基于神经辐射场的多人多目3D人体姿态估计.计算机系统应用,2025,34(10):52-61

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-02-28
  • 最后修改日期:2025-03-24
  • 录用日期:
  • 在线发布日期: 2025-08-28
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62661041 传真: Email:csa@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号