Abstract

Skeleton-based hand gesture recognition is a challenging task that sparked a lot of attention in recent years, especially with the rise of Graph Neural Networks. In this paper, we propose a new deep learning architecture for hand gesture recognition using 3D hand skeleton data and we call STr-GCN. It decouples the spatial and temporal learning of the gesture by leveraging Graph Convolutional Networks (GCN) and Transformers. The key idea is to combine two powerful networks: a Spatial Graph Convolutional Network unit that understands intra-frame interactions to extract powerful features from different hand joints and a Transformer Graph Encoder which is based on a Temporal Self-Attention module to incorporate inter-frame correlations. We evaluate the performance of our method on three benchmarks: the SHREC'17 Track dataset, Briareo dataset and the First Person Hand Action dataset. The experiments show the efficiency of our approach, which achieves or outperforms the state of the art. The code to reproduce our results is available in this link.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.