基于深度学习的二维人体姿态估计算法综述

doi:10.15888/j.cnki.csa.008711

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月9日 21:53 星期三

首页 > 过刊浏览>2022年第31卷第10期 >36-43. DOI:10.15888/j.cnki.csa.008711

PDF HTML阅读 XML下载导出引用引用提醒

基于深度学习的二维人体姿态估计算法综述
DOI:
                        10.15888/j.cnki.csa.008711
                    
CSTR:
                        
                    
作者:
                        马双双马双双
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找
王佳王佳
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找
曹少中曹少中
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找
杨树林杨树林
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找
赵伟赵伟
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找
张寒张寒
北京印刷学院 信息工程学院, 北京 102600
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:北京市自然基金和北京市教委联合项目(KZ202010015021); 北京印刷学院科研项目(Ec202002, Eb202103); 北京印刷学院博士启动基金(27170120003/021); 北京市教育委员会科研计划(KM201910015003, KM201610015001)

Overview on Two-dimensional Human Pose Estimation Methods Based on Deep Learning

Author:

MA Shuang-Shuang
MA Shuang-Shuang
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Jia
WANG Jia
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找
CAO Shao-Zhong
CAO Shao-Zhong
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Shu-Lin
YANG Shu-Lin
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Wei
ZHAO Wei
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Han
ZHANG Han
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [55]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

二维人体姿态估计作为人体动作识别的基础, 随着深度学习和神经网络的流行已经成为备受学者关注的研究热点. 与传统方法相比, 深度学习能够得到更深层图像特征, 对数据的表达更准确, 因此已成为研究的主流方向. 本文主要介绍了二维人体姿态估计算法, 首先根据检测人数分为单人姿态估计与多人姿态估计两类, 其次对单人姿态估计分为基于坐标回归与基于热图检测的方法; 对多人姿态估计可分为自顶向下(top-down)和自底向上(bottom-up)的方法. 最后介绍了姿态估计常用数据集以及评价指标对部分多人姿态估计算法的性能指标进行了对比, 并对人体姿态估计研究所面临的问题与发展趋势进行了阐述.

关键词:深度学习;卷积神经网络;人体姿态估计;关键点检测

Abstract:

As the basis of human motion recognition, two-dimensional human pose estimation has become a research hotspot with the popularity of deep learning and neural networks. Compared with traditional methods, deep learning can achieve deeper image features and express the data more accurately, thus becoming the mainstream of research. This study mainly introduces two-dimensional human pose estimation algorithms. Firstly, according to the number of people detected, the algorithms are divided into two categories for single-person and multi-person pose estimation. Secondly, the single-person pose estimation methods are divided into two groups based on coordinate regression and heat map detection. Multi-person poses can be estimated by top-down and bottom-up methods. Finally, the study introduces commonly used data sets and evaluation indexes of human pose estimation and compares the performance indexes of some multi-person pose estimation algorithms. It also expounds on the challenges and development trends of human pose estimation.

Key words:deep learning;convolutional neural networks (CNN);human pose estimation;key-point detection

参考文献

[1] 王新文, 谢林柏, 彭力. 跌倒异常行为的双重残差网络识别方法. 计算机科学与探索, 2020, 14(9): 1580–1589. [doi: 10.3778/j.issn.1673-9418.1906054

[2] Nie BX, Xiong CM, Zhu SC. Joint action recognition and pose estimation from video. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 1293–1301.

[3] Cho NG, Yuille AL, Lee SW. Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recognition, 2013, 46(3): 649–661. [doi: 10.1016/j.patcog.2012.09.006

[4] 宋一凡, 张鹏, 刘立波. 基于视觉手势识别的人机交互系统. 计算机科学, 2019, (S2): 570–574

[5] 黄友文, 赵朋, 游亚东. 融合反馈机制的姿态引导人物图像生成. 激光与光电子学进展, 2020, 57(14): 141011

[6] Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1): 116–124. [doi: 10.1145/2398356.2398381

[7] Fischler MA, Elschlager RA. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, C-22(1): 67–92. [doi: 10.1109/T-C.1973.223602

[8] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. Computer Vision and Pattern Recognition. Colorado Springs: IEEE, 2011. 1385–1392.

[9] LeCun Y, Boser B, Denker JS, et al. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4): 541–551. [doi: 10.1162/neco.1989.1.4.541

[10] Goodfellow I. NIPS 2016 tutorial: Generative adversarial networks. arXiv: 1701.00160, 2016.

[11] Felzenszwalb PF, Huttenlocher DP. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55–79. [doi: 10.1023/B:VISI.0000042934.15159.49

[12] 冯晓月, 宋杰. 二维人体姿态估计研究进展. 计算机科学, 2020, 47(11): 128–136. [doi: 10.11896/jsjkx.200700061

[13] Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. 886–893.

[14] Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110. [doi: 10.1023/B:VISI.0000029664.99615.94

[15] 韩贵金, 朱虹. 基于HOG和颜色特征融合的人体姿态估计. 模式识别与人工智能, 2014, 27(9): 769–777. [doi: 10.3969/j.issn.1003-6059.2014.09.001

[16] Nägeli T, Oberholzer S, Plüss S, et al. Flycon: Real-time environment-independent multi-view human pose estimation with aerial vehicles. ACM Transactions on Graphics, 2018, 37(6): 182

[17] Achilles F, Ichim AE, Coskun H, et al. Patient MoCap: Human pose estimation under blanket occlusion for hospital monitoring applications. Proceedings of the 19th International Conference on Medical Image Computing and Computer-assisted Intervention. Athens: Springer, 2016. 491–499.

[18] Wang JB, Qiu K, Peng HW, et al. AI coach: Deep human pose estimation and analysis for personalized athletic training assistance. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 2228–2230.

[19] Toshev A, Szegedy C. DeepPose: Human pose estimation via deep neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2013. 1653–1660.

[20] Geng ZG, Sun K, Xiao B, et al. Bottom-up human pose estimation via disentangled keypoint regression. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 14671–14681.

[21] Carreira J, Agrawal P, Fragkiadaki K, et al. Human pose estimation with iterative error feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2015. 4733–4742.

[22] Sun X, Shang JX, Liang S, et al. Compositional human pose regression. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2621–2630.

[23] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 770–778.

[24] Tompson J, Jain A, LeCun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014. 1799–1807.

[25] Tompson J, Goroshin R, Jain A, et al. Efficient object localization using convolutional networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 648–656.

[26] Isack H, Haene C, Keskin C, et al. RePose: Learning deep kinematic priors for fast human pose estimation. arXiv: 2002.03933, 2020.

[27] Artacho B, Savakis A. UniPose: Unified human pose estimation in single images and videos. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 7033–7042.

[28] Iqbal U, Gall J. Multi-person pose estimation with local joint-to-person associations. European Conference on Computer Vision. Amsterdam: Springer, 2016. 627–642.

[29] Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 3711–3719.

[30] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2980–2988.

[31] Fang HS, Xie SQ, Tai YW, et al. RMPE: Regional multi-person pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2353–2362.

[32] Huang JJ, Zhu Z, Guo F, et al. The devil is in the details: Delving into unbiased data processing for human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 5699–5708.

[33] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5686–5696.

[34] Zhang K, He P, Yao P, et al. DNANet: De-normalized attention based multi-resolution network for human pose estimation. arXiv: 1909.05090, 2019.

[35] Pishchulin L, Insafutdinov E, Tang SY, et al. DeepCut: Joint subset partition and labeling for multi person pose estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 4929–4937.

[36] Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut: A Deeper, stronger, and faster multi-person pose estimation model. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 34–50.

[37] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 1302–1310.

[38] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations. San Diego: ICLR, 2015. 1–14.

[39] Osokin D. Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. Prague: SciTePress, 2019. 744–748.

[40] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017.

[41] Kreiss S, Bertoni L, Alahi A. PifPaf: Composite fields for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 11969–11978.

[42] Cheng BW, Xiao B, Wang JD, et al. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 5385–5394.

[43] Luo ZX, Wang ZC, Huang Y, et al. Rethinking the heatmap regression for bottom-up human pose estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2020. 13259–13268.

[44] Varamesh A, Tuytelaars T. Mixture dense regression for object detection and human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 13083–13092.

[45] Newell A, Huang ZA, Deng J. Associative embedding: End-to-end learning for joint detection and grouping. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Long Beach: NIPS, 2017. 2277–2287.

[46] Papandreou G, Zhu T, Chen LC, et al. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 282–299.

[47] Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation. British Machine Vision Conference. Aberystwyth: British Machine Vision Association, 2010. 1–11.

[48] Sapp B, Taskar B. MODEC: Multimodal decomposable models for human pose estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013. 3674–3681.

[49] Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014. 740–755.

[50] Andriluka M, Pishchulin L, Gehler P, et al. 2D Human pose estimation: New benchmark and state of the art analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 3686–3693.

[51] Wu JH, Zheng H, Zhao B, et al. AI challenger: A large-scale dataset for going deeper in image understanding. arXiv: 1711.06475, 2017.

[52] Andriluka M, Iqbal U, Insafutdinov E, et al. PoseTrack: A benchmark for human pose estimation and tracking. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5167–5176.

[53] 周燕, 刘紫琴, 曾凡智, 等. 深度学习的二维人体姿态估计综述. 计算机科学与探索, 2021, 15(4): 641–657. [doi: 10.3778/j.issn.1673-9418.2008088

[54] 田元, 李方迪. 基于深度信息的人体姿态识别研究综述. 计算机工程与应用, 2020, 56(4): 1–8. [doi: 10.3778/j.issn.1002-8331.1910-0445

[55] 邓益侬, 罗健欣, 金凤林. 基于深度学习的人体姿态估计方法综述. 计算机工程与应用, 2019, 55(19): 22–42. [doi: 10.3778/j.issn.1002-8331.1906-0113

引用本文

马双双,王佳,曹少中,杨树林,赵伟,张寒.基于深度学习的二维人体姿态估计算法综述.计算机系统应用,2022,31(10):36-43

复制

文章指标

点击次数:1215
下载次数: 7452
HTML阅读次数: 7173
引用次数: 0

历史

收稿日期:2021-12-20
最后修改日期:2022-01-18
录用日期:
在线发布日期: 2022-06-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码