Abstract:In view of a large quantity of parameters in the Inception-v3 network, this study proposes an effective gesture image recognition method, which can meet the needs of high-precision gesture recognition with few model parameters. In this study, the structure of Inception-v3 is used to redesign the Inception module of the original Inception-v3 to reduce the number and difficulty of learning parameters, and with the residual connection, the integrity of information is protected while the network degradation is prevented. The attention mechanism module is introduced to make the model focus on useful information and dilute useless information, and to a certain extent, it also prevents the overfitting of the model. Moreover, the feature fusion is carried out between the up-sampling and the low-level feature in the model, and the fused feature has better discrimination than the original input feature, which further improves the accuracy of the model. The experimental results indicate that the quantity of the parameters in the improved Inception-v3 network is only 1.65 M, and it has higher accuracy and faster convergence speed. Then, the ASL sign language dataset and the Bangladesh sign language dataset are jumbled separately, and the training set and validation set are divided at a ratio of 4:1. The recognition rates of the improved Inception-v3 on the ASL sign language dataset and Bangladesh sign language dataset are 100% and 95.33%, respectively.