基于深度学习的二维人体姿态估计算法综述
作者:
基金项目:

北京市自然基金和北京市教委联合项目(KZ202010015021); 北京印刷学院科研项目(Ec202002, Eb202103); 北京印刷学院博士启动基金(27170120003/021); 北京市教育委员会科研计划(KM201910015003, KM201610015001)


Overview on Two-dimensional Human Pose Estimation Methods Based on Deep Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [55]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    二维人体姿态估计作为人体动作识别的基础, 随着深度学习和神经网络的流行已经成为备受学者关注的研究热点. 与传统方法相比, 深度学习能够得到更深层图像特征, 对数据的表达更准确, 因此已成为研究的主流方向. 本文主要介绍了二维人体姿态估计算法, 首先根据检测人数分为单人姿态估计与多人姿态估计两类, 其次对单人姿态估计分为基于坐标回归与基于热图检测的方法; 对多人姿态估计可分为自顶向下(top-down)和自底向上(bottom-up)的方法. 最后介绍了姿态估计常用数据集以及评价指标对部分多人姿态估计算法的性能指标进行了对比, 并对人体姿态估计研究所面临的问题与发展趋势进行了阐述.

    Abstract:

    As the basis of human motion recognition, two-dimensional human pose estimation has become a research hotspot with the popularity of deep learning and neural networks. Compared with traditional methods, deep learning can achieve deeper image features and express the data more accurately, thus becoming the mainstream of research. This study mainly introduces two-dimensional human pose estimation algorithms. Firstly, according to the number of people detected, the algorithms are divided into two categories for single-person and multi-person pose estimation. Secondly, the single-person pose estimation methods are divided into two groups based on coordinate regression and heat map detection. Multi-person poses can be estimated by top-down and bottom-up methods. Finally, the study introduces commonly used data sets and evaluation indexes of human pose estimation and compares the performance indexes of some multi-person pose estimation algorithms. It also expounds on the challenges and development trends of human pose estimation.

    参考文献
    [1] 王新文, 谢林柏, 彭力. 跌倒异常行为的双重残差网络识别方法. 计算机科学与探索, 2020, 14(9): 1580–1589. [doi: 10.3778/j.issn.1673-9418.1906054
    [2] Nie BX, Xiong CM, Zhu SC. Joint action recognition and pose estimation from video. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 1293–1301.
    [3] Cho NG, Yuille AL, Lee SW. Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recognition, 2013, 46(3): 649–661. [doi: 10.1016/j.patcog.2012.09.006
    [4] 宋一凡, 张鹏, 刘立波. 基于视觉手势识别的人机交互系统. 计算机科学, 2019, (S2): 570–574
    [5] 黄友文, 赵朋, 游亚东. 融合反馈机制的姿态引导人物图像生成. 激光与光电子学进展, 2020, 57(14): 141011
    [6] Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1): 116–124. [doi: 10.1145/2398356.2398381
    [7] Fischler MA, Elschlager RA. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, C-22(1): 67–92. [doi: 10.1109/T-C.1973.223602
    [8] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. Computer Vision and Pattern Recognition. Colorado Springs: IEEE, 2011. 1385–1392.
    [9] LeCun Y, Boser B, Denker JS, et al. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4): 541–551. [doi: 10.1162/neco.1989.1.4.541
    [10] Goodfellow I. NIPS 2016 tutorial: Generative adversarial networks. arXiv: 1701.00160, 2016.
    [11] Felzenszwalb PF, Huttenlocher DP. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55–79. [doi: 10.1023/B:VISI.0000042934.15159.49
    [12] 冯晓月, 宋杰. 二维人体姿态估计研究进展. 计算机科学, 2020, 47(11): 128–136. [doi: 10.11896/jsjkx.200700061
    [13] Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. 886–893.
    [14] Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110. [doi: 10.1023/B:VISI.0000029664.99615.94
    [15] 韩贵金, 朱虹. 基于HOG和颜色特征融合的人体姿态估计. 模式识别与人工智能, 2014, 27(9): 769–777. [doi: 10.3969/j.issn.1003-6059.2014.09.001
    [16] Nägeli T, Oberholzer S, Plüss S, et al. Flycon: Real-time environment-independent multi-view human pose estimation with aerial vehicles. ACM Transactions on Graphics, 2018, 37(6): 182
    [17] Achilles F, Ichim AE, Coskun H, et al. Patient MoCap: Human pose estimation under blanket occlusion for hospital monitoring applications. Proceedings of the 19th International Conference on Medical Image Computing and Computer-assisted Intervention. Athens: Springer, 2016. 491–499.
    [18] Wang JB, Qiu K, Peng HW, et al. AI coach: Deep human pose estimation and analysis for personalized athletic training assistance. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 2228–2230.
    [19] Toshev A, Szegedy C. DeepPose: Human pose estimation via deep neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2013. 1653–1660.
    [20] Geng ZG, Sun K, Xiao B, et al. Bottom-up human pose estimation via disentangled keypoint regression. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 14671–14681.
    [21] Carreira J, Agrawal P, Fragkiadaki K, et al. Human pose estimation with iterative error feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2015. 4733–4742.
    [22] Sun X, Shang JX, Liang S, et al. Compositional human pose regression. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2621–2630.
    [23] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 770–778.
    [24] Tompson J, Jain A, LeCun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014. 1799–1807.
    [25] Tompson J, Goroshin R, Jain A, et al. Efficient object localization using convolutional networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015. 648–656.
    [26] Isack H, Haene C, Keskin C, et al. RePose: Learning deep kinematic priors for fast human pose estimation. arXiv: 2002.03933, 2020.
    [27] Artacho B, Savakis A. UniPose: Unified human pose estimation in single images and videos. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 7033–7042.
    [28] Iqbal U, Gall J. Multi-person pose estimation with local joint-to-person associations. European Conference on Computer Vision. Amsterdam: Springer, 2016. 627–642.
    [29] Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 3711–3719.
    [30] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2980–2988.
    [31] Fang HS, Xie SQ, Tai YW, et al. RMPE: Regional multi-person pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 2353–2362.
    [32] Huang JJ, Zhu Z, Guo F, et al. The devil is in the details: Delving into unbiased data processing for human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 5699–5708.
    [33] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5686–5696.
    [34] Zhang K, He P, Yao P, et al. DNANet: De-normalized attention based multi-resolution network for human pose estimation. arXiv: 1909.05090, 2019.
    [35] Pishchulin L, Insafutdinov E, Tang SY, et al. DeepCut: Joint subset partition and labeling for multi person pose estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 4929–4937.
    [36] Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut: A Deeper, stronger, and faster multi-person pose estimation model. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 34–50.
    [37] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 1302–1310.
    [38] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations. San Diego: ICLR, 2015. 1–14.
    [39] Osokin D. Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. Prague: SciTePress, 2019. 744–748.
    [40] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017.
    [41] Kreiss S, Bertoni L, Alahi A. PifPaf: Composite fields for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 11969–11978.
    [42] Cheng BW, Xiao B, Wang JD, et al. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 5385–5394.
    [43] Luo ZX, Wang ZC, Huang Y, et al. Rethinking the heatmap regression for bottom-up human pose estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2020. 13259–13268.
    [44] Varamesh A, Tuytelaars T. Mixture dense regression for object detection and human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 13083–13092.
    [45] Newell A, Huang ZA, Deng J. Associative embedding: End-to-end learning for joint detection and grouping. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Long Beach: NIPS, 2017. 2277–2287.
    [46] Papandreou G, Zhu T, Chen LC, et al. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 282–299.
    [47] Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation. British Machine Vision Conference. Aberystwyth: British Machine Vision Association, 2010. 1–11.
    [48] Sapp B, Taskar B. MODEC: Multimodal decomposable models for human pose estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013. 3674–3681.
    [49] Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014. 740–755.
    [50] Andriluka M, Pishchulin L, Gehler P, et al. 2D Human pose estimation: New benchmark and state of the art analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 3686–3693.
    [51] Wu JH, Zheng H, Zhao B, et al. AI challenger: A large-scale dataset for going deeper in image understanding. arXiv: 1711.06475, 2017.
    [52] Andriluka M, Iqbal U, Insafutdinov E, et al. PoseTrack: A benchmark for human pose estimation and tracking. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5167–5176.
    [53] 周燕, 刘紫琴, 曾凡智, 等. 深度学习的二维人体姿态估计综述. 计算机科学与探索, 2021, 15(4): 641–657. [doi: 10.3778/j.issn.1673-9418.2008088
    [54] 田元, 李方迪. 基于深度信息的人体姿态识别研究综述. 计算机工程与应用, 2020, 56(4): 1–8. [doi: 10.3778/j.issn.1002-8331.1910-0445
    [55] 邓益侬, 罗健欣, 金凤林. 基于深度学习的人体姿态估计方法综述. 计算机工程与应用, 2019, 55(19): 22–42. [doi: 10.3778/j.issn.1002-8331.1906-0113
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

马双双,王佳,曹少中,杨树林,赵伟,张寒.基于深度学习的二维人体姿态估计算法综述.计算机系统应用,2022,31(10):36-43

复制
分享
文章指标
  • 点击次数:1215
  • 下载次数: 7452
  • HTML阅读次数: 7173
  • 引用次数: 0
历史
  • 收稿日期:2021-12-20
  • 最后修改日期:2022-01-18
  • 在线发布日期: 2022-06-24
文章二维码
您是第11318801位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号