Human Pose Estimation Based on Stacked Hourglass Network
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Human pose estimation plays an important role in many computer vision tasks. However, it remains challenging due to complex pose changes, illumination, occlusion, and low resolution. The high-level semantic information from deep convolutional neural networks provides an effective way to improve the accuracy of human pose estimation. In this study, an improved stacked hourglass network is proposed. A large-receptive-field residual module and a preprocessing module are designed to better outline structural features of a human body so that rich contextual information can be obtained. The network performs well in the presence of partial occlusion, large pose change, complex background, etc. In addition, the positioning accuracy is further enhanced by the fusion of results from different stages. Experiments on MPII data sets and LSP data sets prove the effectiveness of this model.

    Reference
    [1] Wang CY, Wang YZ, Yuille AL. An approach to pose-based action recognition. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013. 915–922.
    [2] Liang ZJ, Wang XL, Huang R, et al. An expressive deep model for human action parsing from a single image. Proceedings of 2014 IEEE International Conference on Multimedia and Expo (ICME). Chengdu: IEEE, 2014. 1–6.
    [3] Cho NG, Yuille AL, Lee SW. Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recognition, 2013, 46(3): 649–661. [doi: 10.1016/j.patcog.2012.09.006
    [4] Nie BX, Xiong CM, Zhu SC. Joint action recognition and pose estimation from video. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1293–1301.
    [5] Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1): 116–124. [doi: 10.1145/2398356.2398381
    [6] Sarafianos N, Boteanu B, Ionescu B, et al. 3D human pose estimation: A review of the literature and analysis of covariates. Computer Vision and Image Understanding, 2016, 152: 1–20. [doi: 10.1016/j.cviu.2016.09.002
    [7] Dantone M, Gall J, Leistner C, et al. Human pose estimation using body parts dependent joint regressors. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013. 3041–3048.
    [8] Zhang H, Ouyang H, Liu S, et al. Human pose estimation with spatial contextual information. arXiv: 1901.01760, 2019.
    [9] Tang W, Yu P, Wu Y. Deeply learned compositional models for human pose estimation. Proceedings of the 15th European Conference on Computer Vision (ECCV). Cham: Springer, 2018. 197–214.
    [10] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016. 483–499.
    [11] Ke LP, Chang MC, Qi HG, et al. Multi-scale structure-aware network for human pose estimation. Proceedings of the 15th European Conference on Computer Vision (ECCV). Cham: Springer, 2018. 731–746.
    [12] Chen YL, Wang ZC, Peng YX, et al. Cascaded pyramid network for multi-person pose estimation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7103–7112.
    [13] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 936–944.
    [14] Cao Z, Hidalgo G, Simon T, et al. OpenPose: Realtime multi-person 2D Pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172–186. [doi: 10.1109/TPAMI.2019.2929257
    [15] He KM, Zhang XY, Ren Q, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [16] Wei SE, Ramakrishna V, Sheikh Y, et al. Convolutional pose machines. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 4724–4732.
    [17] Johnson S, Everingham M. Learning effective human pose estimation from inaccurate annotation. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs: IEEE, 2011. 1465–1472.
    [18] Andriluka M, Pishchulin L, Gehler P, et al. 2D human pose estimation: New benchmark and state of the art analysis. Proceedings of 2014 IEEE Conference on computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 3686–3693.
    [19] Yang Y, Ramanan D. Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2878–2890. [doi: 10.1109/TPAMI.2012.261
    [20] Pishchulin L, Insafutdinov E, Tang SY, et al. DeepCut: Joint subset partition and labeling for multi person pose estimation. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 4929–4937.
    [21] Bulat A, Tzimiropoulos G. Human pose estimation via convolutional part heatmap regression. Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016. 717–732.
    [22] Sun K, Lan CL, Xing JL, et al. Human pose estimation using global and local normalization. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 5600–5608.
    [23] Chu X, Yang W, Ouyang WL, et al. Multicontext attention for human pose estimation. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 5669–5678.
    [24] Peng X, Tang ZQ, Yang F, et al. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 2226–2234.
    [25] Chen Y, Shen CH, Wei XS, et al. Adversarial posenet: A structure-aware convolutional network for human pose estimation. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017.1221–1230.
    [26] Yang W, Li S, Ouyang WL, et al. Learning feature pyramids for human pose estimation. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 1290–1299.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

吴佳豪,周凤,李亮亮.基于堆叠沙漏网络的人体姿态估计.计算机系统应用,2021,30(10):295-300

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 05,2021
  • Revised:February 03,2021
  • Online: October 08,2021
Article QR Code
You are the first991210Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063