Abstract:Human pose estimation plays an important role in many computer vision tasks. However, it remains challenging due to complex pose changes, illumination, occlusion, and low resolution. The high-level semantic information from deep convolutional neural networks provides an effective way to improve the accuracy of human pose estimation. In this study, an improved stacked hourglass network is proposed. A large-receptive-field residual module and a preprocessing module are designed to better outline structural features of a human body so that rich contextual information can be obtained. The network performs well in the presence of partial occlusion, large pose change, complex background, etc. In addition, the positioning accuracy is further enhanced by the fusion of results from different stages. Experiments on MPII data sets and LSP data sets prove the effectiveness of this model.