Abstract:Hand pose estimation plays an important role in human-computer interaction, hand function assessment, virtual reality, and augmented reality. Therefore, a new hand pose estimation method is proposed to handle the relatively small proportion of hand region in most images and the occlusion problem of single-view keypoint detection algorithms. The proposed method first extracts the hand target region by using a semantic segmentation model which introduces the Bayesian convolutional neural networks. According to the hand localization result, the proposed method adopts a new model based on the attention mechanism and cascade guidance strategy to obtain accurate 2D hand keypoint detection results. Then, the proposed method uses a deep network based on a stereo vision algorithm to calculate the depth information of the keypoints, and the view self-learning function is provided in depth estimation. The algorithm uses triangulation as the foundation, and the RANSAC algorithm is used to correct the measurement results. Finally, the 3D hand keypoint detection results can be optimized by using multi-task learning and reprojection training, and the 3D pose of the hand keypoints can be obtained. Experimental results show that compared with some representative hand region detection algorithms, the proposed method has a significant improvement in the average detection precision and running time for hand regions. In addition, in terms of the end-point-error mean (EPE_mean) and the area under PCK curve (AUC) of different pose estimation methods, it can be seen that the keypoint detection performance of the proposed method is better. Thus, a better hand pose estimation result can be obtained.