本文已被:浏览 816次 下载 1517次
Received:February 23, 2022 Revised:April 02, 2022
Received:February 23, 2022 Revised:April 02, 2022
中文摘要: 鉴于Inception-v3网络参数量过大的问题, 本文提出了一种有效的手势图像识别方法, 能够满足在模型参数量较少的情况下高精度手势识别的需求. 本文利用Inception-v3的结构, 对原Inception-v3的Inception模块重新进行设计, 降低学习的参数量和难度, 结合残差连接, 保护信息的完整性, 防止网络退化, 引入注意力机制模块, 让模型聚焦于有用的信息而淡化无用信息, 在一定程度上也防止了模型的过拟合, 并且在模型中进行上采样与低层特征进行特征融合, 融合后的特征比原输入特征更具有判别能力, 进一步提高模型的准确率. 实验结果表明改进的Inception-v3网络的参数量仅为1.65 M, 而且拥有更高的准确率和更快的收敛速度. 将ASL手语数据集与孟加拉手语数据集分别打乱, 然后按照4:1的比例单独划分出训练集和验证集. 改进的Inception-v3在ASL手语数据集与孟加拉手语数据集上的识别率分别达到了100%和95.33%.
中文关键词: 手语识别 Inception-v3网络 注意力机制模块 上采样 特征融合 深度学习 卷积神经网络 (CNN)
Abstract:In view of a large quantity of parameters in the Inception-v3 network, this study proposes an effective gesture image recognition method, which can meet the needs of high-precision gesture recognition with few model parameters. In this study, the structure of Inception-v3 is used to redesign the Inception module of the original Inception-v3 to reduce the number and difficulty of learning parameters, and with the residual connection, the integrity of information is protected while the network degradation is prevented. The attention mechanism module is introduced to make the model focus on useful information and dilute useless information, and to a certain extent, it also prevents the overfitting of the model. Moreover, the feature fusion is carried out between the up-sampling and the low-level feature in the model, and the fused feature has better discrimination than the original input feature, which further improves the accuracy of the model. The experimental results indicate that the quantity of the parameters in the improved Inception-v3 network is only 1.65 M, and it has higher accuracy and faster convergence speed. Then, the ASL sign language dataset and the Bangladesh sign language dataset are jumbled separately, and the training set and validation set are divided at a ratio of 4:1. The recognition rates of the improved Inception-v3 on the ASL sign language dataset and Bangladesh sign language dataset are 100% and 95.33%, respectively.
keywords: sign language recognition Inception-v3 network attention mechanism module up-sampling feature fusion deep learning convolutional neural network (CNN)
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(51405448); 浙江理工大学博士科研启动项目(11122932611817); 国家级大学生创新创业训练计划(11120032382104); 浙江理工大学大学生科创项目(11120032662023); 浙江理工大学信息学院教育教学改革研究项目(11120033312202)
引用文本:
邓志军,田秋红.改进Inception-v3网络的手势图像识别.计算机系统应用,2022,31(11):157-166
DENG Zhi-Jun,TIAN Qiu-Hong.Improved Inception-v3 Network for Gesture Image Recognition.COMPUTER SYSTEMS APPLICATIONS,2022,31(11):157-166
邓志军,田秋红.改进Inception-v3网络的手势图像识别.计算机系统应用,2022,31(11):157-166
DENG Zhi-Jun,TIAN Qiu-Hong.Improved Inception-v3 Network for Gesture Image Recognition.COMPUTER SYSTEMS APPLICATIONS,2022,31(11):157-166