Abstract

Dynamic gesture recognition, as an important component of human-computer interaction (HCI), has attracted the attention of many researchers. However, there are still many challenges in the actual gesture recognition task, such as the non-rigid properties of the hand, the acquisition of different semantic information through spatial and motion information at different locations, and the changes of light and background in the surrounding environment can ultimately affect the extraction of gesture features. Since the essence of dynamic gesture recognition is classification and recognition, how to extract the spatio-temporal features of dynamic gestures will have a significant impact on the final recognition and classification of gestures. In recent years, three-dimensional convolutional neural networks have achieved excellent performance in motion recognition, but traditional three-dimensional convolution is redundant in the feature extraction process. In order to reduce the resource consumption, this paper adopts 3D separable convolution as an alternative solution. To further extract semantic and action information of gestures, we combine attentional mechanisms and long- and short-term memory To further extract semantic and action information of gestures, we combine attentional mechanisms and long- and short-term memory (LSTM) networks for gesture recognition in this paper. We conducted experiments on the ChaLearn Large-Scale Gesture Recognition Dataset (IsoGD), and the experimental results validate the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call