基于多层空间特征融合的三维人体姿态估计

doi:10.15888/j.cnki.csa.009602

微信公众号

网站二维码

首页 > 过刊浏览>2024年第33卷第8期 >250-256. DOI:10.15888/j.cnki.csa.009602

PDF HTML阅读 XML下载导出引用引用提醒

基于多层空间特征融合的三维人体姿态估计
DOI:
                        10.15888/j.cnki.csa.009602
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

3D Human Pose Estimation Based on Multi-layer Spatial Feature Fusion

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在三维人体姿态估计任务当中, 人体关节之间的连接关系形成了一种复杂的拓扑结构, 利用图卷积网络对该结构进行建模, 可以有效捕捉局部关节间的联系; 尽管不相邻关节之间没有直接的物理连接, 但由于人体的运动和姿态受到生物力学约束以及人体关节之间的协同作用, 利用Transformer编码器建立关节之间的上下文关系, 可以更好地推断出人体姿态; 在大模型的背景下, 如何在保证模型性能的同时, 降低参数量, 也显得尤为重要. 针对上述问题, 设计了一个基于图卷积和Transformer的多层空间特征融合网络模型(MLSFFN), 在使用相对少量的参数基础上, 有效地融合了局部和全局空间特征. 实验结果表明, 本文提出的方法在仅需2.1M参数量的情况下, 在Human3.6M数据集上达到了49.9 mm的平均每关节误差(MPJPE). 此外, 模型在MPI-INF-3DHP数据集上也展示出了较强的泛化能力.

Abstract:

In the task of 3D human pose estimation, the complex topology formed by the connection relationship between human joints presents a challenge. Effective capture of the connections between local joints is possible through modeling this structure with a graph convolutional network. Although non-adjacent joints lack direct physical connections, Transformer encoders establish contextual relationships between joints, which is crucial for better human posture inference due to the biomechanical constraints influencing human motion and pose, as well as the synergistic interaction of human joints. Balancing model performance with a reduction in the number of parameters is of particular importance for large-scale models. To tackle these challenges, a multi-layer spatial feature fusion network model (MLSFFN) based on graph convolution and Transformer is designed. This model proficiently fuses local and global spatial features with a relatively minimal parameter set. Experimental results demonstrate that the proposed method achieves a mean point per joint error (MPJPE) of 49.9 mm on the Human3.6M dataset with only 2.1M parameters. Moreover, the model demonstrates a robust generalization capability.

参考文献

相似文献

引证文献

引用本文

梁桉源,肖学中.基于多层空间特征融合的三维人体姿态估计.计算机系统应用,2024,33(8):250-256

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-02-26
最后修改日期:2024-03-28
录用日期:
在线发布日期: 2024-06-28
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码