Abstract:Human pose estimation based on deep learning is widely used in pose recognition, human-computer interaction, and other fields. In order to improve the detection accuracy of key points of the human body, many networks adopt a model architecture with increasing calculation amount, parameter amount, and complexity, which is impossible to be directly deployed to low-computing devices. To solve the above issues, this study proposes a lightweight method for multi-branch feature attention fusion. The model is based on the HigherHRNet network for lightweight design and training. Specifically, channel splitting and channel shuffling are adopted to solve the information isolation between feature layers after group convolution; the feature generation method of linear operation is used to address the redundancy between different feature layers; the method of fusing attention information is employed to alleviate the accuracy drop caused by lightweight. The training, testing, visualization, and ablation experiments of the model are completed on the MS COCO dataset. The experimental results show that the lightweight method in this study can significantly reduce the calculation amount of human pose estimation under the premise of ensuring intuitive detection accuracy.