改进Inception-v3网络的手势图像识别

doi:10.15888/j.cnki.csa.008793

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月2日 17:12 星期三

首页 > 过刊浏览>2022年第31卷第11期 >157-166. DOI:10.15888/j.cnki.csa.008793

PDF HTML阅读 XML下载导出引用引用提醒

改进Inception-v3网络的手势图像识别
DOI:
                        10.15888/j.cnki.csa.008793
                    
CSTR:
                        
                    
作者:
                        邓志军邓志军
浙江理工大学 信息学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找
田秋红田秋红
浙江理工大学 信息学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(51405448); 浙江理工大学博士科研启动项目(11122932611817); 国家级大学生创新创业训练计划(11120032382104); 浙江理工大学大学生科创项目(11120032662023); 浙江理工大学信息学院教育教学改革研究项目(11120033312202)

Improved Inception-v3 Network for Gesture Image Recognition

Author:

DENG Zhi-Jun
DENG Zhi-Jun
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找
TIAN Qiu-Hong
TIAN Qiu-Hong
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [23]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

鉴于Inception-v3网络参数量过大的问题, 本文提出了一种有效的手势图像识别方法, 能够满足在模型参数量较少的情况下高精度手势识别的需求. 本文利用Inception-v3的结构, 对原Inception-v3的Inception模块重新进行设计, 降低学习的参数量和难度, 结合残差连接, 保护信息的完整性, 防止网络退化, 引入注意力机制模块, 让模型聚焦于有用的信息而淡化无用信息, 在一定程度上也防止了模型的过拟合, 并且在模型中进行上采样与低层特征进行特征融合, 融合后的特征比原输入特征更具有判别能力, 进一步提高模型的准确率. 实验结果表明改进的Inception-v3网络的参数量仅为1.65 M, 而且拥有更高的准确率和更快的收敛速度. 将ASL手语数据集与孟加拉手语数据集分别打乱, 然后按照4:1的比例单独划分出训练集和验证集. 改进的Inception-v3在ASL手语数据集与孟加拉手语数据集上的识别率分别达到了100%和95.33%.

关键词:手语识别;Inception-v3网络;注意力机制模块;上采样;特征融合;深度学习;卷积神经网络 (CNN)

Abstract:

In view of a large quantity of parameters in the Inception-v3 network, this study proposes an effective gesture image recognition method, which can meet the needs of high-precision gesture recognition with few model parameters. In this study, the structure of Inception-v3 is used to redesign the Inception module of the original Inception-v3 to reduce the number and difficulty of learning parameters, and with the residual connection, the integrity of information is protected while the network degradation is prevented. The attention mechanism module is introduced to make the model focus on useful information and dilute useless information, and to a certain extent, it also prevents the overfitting of the model. Moreover, the feature fusion is carried out between the up-sampling and the low-level feature in the model, and the fused feature has better discrimination than the original input feature, which further improves the accuracy of the model. The experimental results indicate that the quantity of the parameters in the improved Inception-v3 network is only 1.65 M, and it has higher accuracy and faster convergence speed. Then, the ASL sign language dataset and the Bangladesh sign language dataset are jumbled separately, and the training set and validation set are divided at a ratio of 4:1. The recognition rates of the improved Inception-v3 on the ASL sign language dataset and Bangladesh sign language dataset are 100% and 95.33%, respectively.

Key words:sign language recognition;Inception-v3 network;attention mechanism module;up-sampling;feature fusion;deep learning;convolutional neural network (CNN)

参考文献

[1] 李清水, 方志刚, 沈模卫, 等. 手势识别技术及其在人机交互中的应用. 人类工效学, 2002, 8(1): 27–29, 33

[2] 田秋红, 杨慧敏, 梁庆龙, 等. 视觉动态手势识别综述. 浙江理工大学学报(自然科学版), 2020, 43(4): 557–569

[3] Hrúz M, Trojanová J, Železný M. Local binary pattern based features for sign language recognition. Pattern Recognition and Image Analysis, 2012, 22(4): 519–526

[4] 陈影柔, 田秋红, 杨慧敏, 等. 基于多特征加权融合的静态手势识别. 计算机系统应用, 2021, 30(2): 20–27. [doi: 10.1134/S1054661812040062

[5] Pigou L, Dieleman S, Kindermans PJ, et al. Sign language recognition using convolutional neural networks. Proceedings of the Workshop at the European Conference on Computer Vision. Zurich: Springer, 2014. 572–578.

[6] Jie H, Zhou WG, Li HQ, et al. Sign language recognition using 3D convolutional neural networks. Proceedings of 2015 IEEE International Conference on Multimedia and Expo (ICME). Turin: IEEE, 2015. 1–6.

[7] Hore S, Chatterjee S, Santhi V, et al. Indian sign language recognition using optimized neural networks. Proceedings of the 2015 International Conference on Information Technology and Intelligent Transportation Systems. Xi’an: Springer, 2017. 553–563.

[8] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1725–1732.

[9] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.

[10] He KM, Zhang XY, Ren SQ, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(9): 1904–1916

[11] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031

[12] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2818–2826.

[13] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: ACM, 2015. 448–456.

[14] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1–9.

[15] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017.

[16] Sandler M, Howard A, Zhu ML, et al. MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4510–4520.

[17] Howard A, Sandler M, Chu G, et al. Searching for MobileNetV3. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1314–1324.

[18] Zhang XY, Zhou XY, Lin MX, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 6848–6856.

[19] Ma NN, Zhang XY, Zheng HT, et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 116–131.

[20] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 3–19.

[21] He KM, Zhang XY, Ren SQ, et al. Identity mappings in deep residual networks. Proceedings of the 14th European Conference on Computer Vision. Munich: Springer, 2016. 630–645.

[22] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014.

[23] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.

引用本文

邓志军,田秋红.改进Inception-v3网络的手势图像识别.计算机系统应用,2022,31(11):157-166

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-02-23
最后修改日期:2022-04-02
录用日期:
在线发布日期: 2022-07-14
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码