Abstract:To solve the problems of missing feature extraction by convolutional neural network and insufficient multi-feature extraction of a gesture, this study proposes a static gesture recognition method based on a residual double attention module and a cross-level feature fusion module. The designed residual double attention module can enhance the low-level features extracted by a ResNet50 network, effectively learn the key information, update the weight, and improve the attention to high-level features. Then, the cross-level feature fusion module fuses the high-level and low-level features in different stages to enrich the semantic and location information between different levels in the high-level feature map. Finally, the Softmax classifier of the fully connected layer is used to classify and recognize the gesture image. The experiment is carried out on the American sign language (ASL) dataset. The average recognition accuracy is 99.68%, which is 2.52% higher than that of the basic ResNet50 network. The results show that the proposed method can fully extract and reuse gesture features and effectively improve the recognition accuracy of gesture images.