Fast Skeleton-based Hand Gesture Recognition Model
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [29]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Skeleton data is compact and robust to environmental conditions for hand gesture recognition. Recent studies of skeleton-based hand gesture recognition often use deep neural networks to extract spatial and temporal information. However, these methods are likely to have problems such as complicated computation and a large number of model parameters. To solve this problem, this study presents a lightweight and efficient hand gesture recognition model. It uses two spatial geometric features calculated from skeleton sequences and automatically learned motion trajectory features to achieve hand gesture classification with convolutional networks alone as its backbone network. The proposed model has a minimum number of parameters as small as 0.16M and a maximum computational complexity of 0.03 GFLOPs. This method is also evaluated on two public datasets, where it outperforms the other methods that use skeleton modality as input.

    Reference
    [1] Cao Z, Hidalgo G, Simon T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172–186. [doi: 10.1109/TPAMI.2019.2929257
    [2] Shahroudy A, Liu J, Ng TT, et al. NTU RGB+ D: A large scale dataset for 3d human activity analysis. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1010–1019.
    [3] Zhang F, Bazarevsky V, Vakunov A, et al. MediaPipe hands: On-device real-time hand tracking. arXiv: 2006.10214, 2020.
    [4] Panteleris P, Oikonomidis I, Argyros A. Using a single RGB frame for real time 3D hand pose estimation in the wild. Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018. 436–445.
    [5] Ge LH, Liang H, Yuan JS, et al. Real-time 3D hand pose estimation with 3D convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4): 956–970. [doi: 10.1109/TPAMI.2018.2827052
    [6] Ren Z, Meng JJ, Yuan JS, et al. Robust hand gesture recognition with Kinect sensor. Proceedings of the 19th ACM International Conference on Multimedia. Scottsdale: ACM, 2011. 759–760.
    [7] Wei SE, Tang NC, Lin YY, et al. Skeleton-augmented human action understanding by learning with progressively refined data. Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia. Orlando: ACM, 2014. 7–10.
    [8] Vijayakumari SG, Nandhinee G, Sneha KR. Hand gesture controlled aerial surveillance drone. International Journal of Pure and Applied Mathematics, 2018, 119(15): 897–901
    [9] Ghorbel E, Boutteau R, Boonaert J, et al. Kinematic spline curves: A temporal invariant descriptor for fast action recognition. Image and Vision Computing, 2018, 77: 60–71. [doi: 10.1016/j.imavis.2018.06.004
    [10] Wang HS, Wang L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection. IEEE Transactions on Image Processing, 2018, 27(9): 4382–4394. [doi: 10.1109/TIP.2018.2837386
    [11] de Smedt Q, Wannous H, Vandeborre JP. Skeleton-based dynamic hand gesture recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas: IEEE, 2016. 1206–1214.
    [12] de Smedt Q, Wannous H, Vandeborre JP, et al. SHREC’17 track: 3D hand gesture recognition using a depth and skeletal dataset. Proceedings of the 10th Eurographics Workshop on 3D Object Retrieval. Lyon: HAL, 2017. 1–6.
    [13] 郭贺圆. 基于Kinect的手势动作识别技术研究[硕士学位论文]. 长春: 长春理工大学, 2020.
    [14] 缪永伟, 李佳颖, 孙树森. 融合手势全局运动和手指局部运动的动态手势识别. 计算机辅助设计与图形学学报, 2020, 32(9): 1492–1501
    [15] Yang F, Sakti S, Wu Y, et al. Make skeleton-based action recognition model smaller, faster and better. arXiv: 1907.09658, 2020.
    [16] Emporio M, Caputo A, Giachetti A. STRONGER: Simple trajectory-based online gesture recognizer. In: Frosini P, Giorgi D, Melzi S, et al., eds. Proceedings of Eurographics Italian Chapter Conference—Smart Tools and Apps for Gra-phics. The Euro-graphics Association, 2021.
    [17] 王焱章. 基于时空图卷积网络的手语翻译[硕士学位论文]. 南京: 南京邮电大学, 2020.
    [18] Chen YX, Zhao L, Peng X, et al. Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. Proceedings of the 30th British Machine Vision Conference. Cardiff: BMVA Press, 2019. 103.
    [19] Li CK, Wang PC, Wang S, et al. Skeleton-based action recognition using LSTM and CNN. Proceedings of 2017 IEEE International Conference on Multimedia & Expo Workshops. Hong Kong: IEEE, 2017. 585–590.
    [20] Wang YX, Shi YB, Wei GL. A novel local feature descriptor based on energy information for human activity recognition. Neurocomputing, 2017, 228: 19–28. [doi: 10.1016/j.neucom.2016.07.058
    [21] Sandler M, Howard A, Zhu ML, et al. MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4510–4520.
    [22] Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition. Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013. 3192–3199.
    [23] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014.
    [24] Min YC, Zhang YX, Chai XJ, et al. An efficient PointLSTM for point clouds based gesture recognition. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5760–5769.
    [25] Hou JX, Wang GJ, Chen XH, et al. Spatial-temporal attention Res-TCN for skeleton-based dynamic hand gesture recognition. Proceedings of ECCV 2018 Workshops on Computer Vision. Munich: Springer, 2018. 273–286.
    [26] Yan SJ, Xiong YJ, Lin DH. Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans: AAAI, 2018. 912.
    [27] Zolfaghari M, Oliveira GL, Sedaghat N, et al. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 2904–2913.
    [28] Ludl D, Gulde T, Curio C. Simple yet efficient real-time pose-based action recognition. Proceedings of 2019 IEEE Intelligent Transportation Systems Conference. Auckland: IEE, 2019. 581–588.
    [29] Choutas V, Weinzaepfel P, Revaud J, et al. PoTion: Pose motion representation for action recognition. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7024–7033.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

赵阳,刘汉超,董兰芳.基于骨架的快速手势识别模型.计算机系统应用,2022,31(11):261-267

Copy
Share
Article Metrics
  • Abstract:858
  • PDF: 2083
  • HTML: 1913
  • Cited by: 0
History
  • Received:February 24,2022
  • Revised:March 28,2022
  • Online: July 07,2022
Article QR Code
You are the first991214Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063