Abstract

Skeleton-based hand gesture recognition is an active research topic in computer graphics and computer vision and has a wide range of applications in VR/AR and robotics. Although the spatial–temporal graph convolutional network has been successfully used in skeleton-based hand gesture recognition, these works often use a fixed spatial graph according to the hand skeleton tree or use a fixed graph on the temporal dimension, which may not be optimal for hand gesture recognition. In this paper, we propose a two-stream graph attention convolutional network with spatial–temporal attention for hand gesture recognition. We adopt pose stream and motion stream as the two input streams for our network. In pose stream, we use the joint in each frame as the input; In motion stream, we use the joint offsets between neighboring frames as the input. We propose a new temporal graph attention module to model the temporal dependency and also use a spatial graph attention module to construct dynamic skeleton graph. For each stream, we adopt graph convolutional network with spatial–temporal attention to extract the features. Then, we concatenate the feature of the pose stream and motion stream for gesture recognition. We achieve the competitive performance on the main hand gesture recognition benchmark datasets, which demonstrates the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.