基于骨架的快速手势识别模型
作者:
基金项目:

国家重点研发计划重点专项(2020YFB1313602)


Fast Skeleton-based Hand Gesture Recognition Model
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    对于手势识别来说, 骨架数据是一种紧凑且对环境条件稳健的数据模态. 最近基于骨架的手势识别研究多使用深度神经网络去提取空间和时间的信息, 然而这些方法可能存在复杂的计算和大量的模型参数的问题. 为了解决这个问题, 我们提出一种轻量高效的手势识别模型. 该模型使用从骨架序列上计算出的两种空间几何特征, 以及自动学习的运动轨迹特征, 然后只使用卷积网络作为骨干网络实现手势分类. 最终我们的模型参数量最少情况下仅为0.16 M, 计算复杂度最大情况为0.03 GFLOPs. 我们在公开的两个数据集上评估了我们的方法, 与其他输入为骨架模态的方法相比, 我们的方法取得了相应数据集上最好的结果.

    Abstract:

    Skeleton data is compact and robust to environmental conditions for hand gesture recognition. Recent studies of skeleton-based hand gesture recognition often use deep neural networks to extract spatial and temporal information. However, these methods are likely to have problems such as complicated computation and a large number of model parameters. To solve this problem, this study presents a lightweight and efficient hand gesture recognition model. It uses two spatial geometric features calculated from skeleton sequences and automatically learned motion trajectory features to achieve hand gesture classification with convolutional networks alone as its backbone network. The proposed model has a minimum number of parameters as small as 0.16M and a maximum computational complexity of 0.03 GFLOPs. This method is also evaluated on two public datasets, where it outperforms the other methods that use skeleton modality as input.

    参考文献
    [1] Cao Z, Hidalgo G, Simon T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172–186. [doi: 10.1109/TPAMI.2019.2929257
    [2] Shahroudy A, Liu J, Ng TT, et al. NTU RGB+ D: A large scale dataset for 3d human activity analysis. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1010–1019.
    [3] Zhang F, Bazarevsky V, Vakunov A, et al. MediaPipe hands: On-device real-time hand tracking. arXiv: 2006.10214, 2020.
    [4] Panteleris P, Oikonomidis I, Argyros A. Using a single RGB frame for real time 3D hand pose estimation in the wild. Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018. 436–445.
    [5] Ge LH, Liang H, Yuan JS, et al. Real-time 3D hand pose estimation with 3D convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4): 956–970. [doi: 10.1109/TPAMI.2018.2827052
    [6] Ren Z, Meng JJ, Yuan JS, et al. Robust hand gesture recognition with Kinect sensor. Proceedings of the 19th ACM International Conference on Multimedia. Scottsdale: ACM, 2011. 759–760.
    [7] Wei SE, Tang NC, Lin YY, et al. Skeleton-augmented human action understanding by learning with progressively refined data. Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia. Orlando: ACM, 2014. 7–10.
    [8] Vijayakumari SG, Nandhinee G, Sneha KR. Hand gesture controlled aerial surveillance drone. International Journal of Pure and Applied Mathematics, 2018, 119(15): 897–901
    [9] Ghorbel E, Boutteau R, Boonaert J, et al. Kinematic spline curves: A temporal invariant descriptor for fast action recognition. Image and Vision Computing, 2018, 77: 60–71. [doi: 10.1016/j.imavis.2018.06.004
    [10] Wang HS, Wang L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection. IEEE Transactions on Image Processing, 2018, 27(9): 4382–4394. [doi: 10.1109/TIP.2018.2837386
    [11] de Smedt Q, Wannous H, Vandeborre JP. Skeleton-based dynamic hand gesture recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas: IEEE, 2016. 1206–1214.
    [12] de Smedt Q, Wannous H, Vandeborre JP, et al. SHREC’17 track: 3D hand gesture recognition using a depth and skeletal dataset. Proceedings of the 10th Eurographics Workshop on 3D Object Retrieval. Lyon: HAL, 2017. 1–6.
    [13] 郭贺圆. 基于Kinect的手势动作识别技术研究[硕士学位论文]. 长春: 长春理工大学, 2020.
    [14] 缪永伟, 李佳颖, 孙树森. 融合手势全局运动和手指局部运动的动态手势识别. 计算机辅助设计与图形学学报, 2020, 32(9): 1492–1501
    [15] Yang F, Sakti S, Wu Y, et al. Make skeleton-based action recognition model smaller, faster and better. arXiv: 1907.09658, 2020.
    [16] Emporio M, Caputo A, Giachetti A. STRONGER: Simple trajectory-based online gesture recognizer. In: Frosini P, Giorgi D, Melzi S, et al., eds. Proceedings of Eurographics Italian Chapter Conference—Smart Tools and Apps for Gra-phics. The Euro-graphics Association, 2021.
    [17] 王焱章. 基于时空图卷积网络的手语翻译[硕士学位论文]. 南京: 南京邮电大学, 2020.
    [18] Chen YX, Zhao L, Peng X, et al. Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. Proceedings of the 30th British Machine Vision Conference. Cardiff: BMVA Press, 2019. 103.
    [19] Li CK, Wang PC, Wang S, et al. Skeleton-based action recognition using LSTM and CNN. Proceedings of 2017 IEEE International Conference on Multimedia & Expo Workshops. Hong Kong: IEEE, 2017. 585–590.
    [20] Wang YX, Shi YB, Wei GL. A novel local feature descriptor based on energy information for human activity recognition. Neurocomputing, 2017, 228: 19–28. [doi: 10.1016/j.neucom.2016.07.058
    [21] Sandler M, Howard A, Zhu ML, et al. MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4510–4520.
    [22] Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition. Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013. 3192–3199.
    [23] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014.
    [24] Min YC, Zhang YX, Chai XJ, et al. An efficient PointLSTM for point clouds based gesture recognition. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5760–5769.
    [25] Hou JX, Wang GJ, Chen XH, et al. Spatial-temporal attention Res-TCN for skeleton-based dynamic hand gesture recognition. Proceedings of ECCV 2018 Workshops on Computer Vision. Munich: Springer, 2018. 273–286.
    [26] Yan SJ, Xiong YJ, Lin DH. Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans: AAAI, 2018. 912.
    [27] Zolfaghari M, Oliveira GL, Sedaghat N, et al. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 2904–2913.
    [28] Ludl D, Gulde T, Curio C. Simple yet efficient real-time pose-based action recognition. Proceedings of 2019 IEEE Intelligent Transportation Systems Conference. Auckland: IEE, 2019. 581–588.
    [29] Choutas V, Weinzaepfel P, Revaud J, et al. PoTion: Pose motion representation for action recognition. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7024–7033.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

赵阳,刘汉超,董兰芳.基于骨架的快速手势识别模型.计算机系统应用,2022,31(11):261-267

复制
分享
文章指标
  • 点击次数:849
  • 下载次数: 2066
  • HTML阅读次数: 1890
  • 引用次数: 0
历史
  • 收稿日期:2022-02-24
  • 最后修改日期:2022-03-28
  • 在线发布日期: 2022-07-07
文章二维码
您是第11202862位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号