A two stream convolutional neural network with bi-directional GRU model to classify dynamic hand gesture

Bindu Verma

doi:10.1016/j.jvcir.2022.103554

Abstract

Dynamic hand gesture recognition is still an interesting topic for the computer vision community. A set of feature vectors can represent any hand gesture. A Recurrent Neural Network (RNN) can recognize these feature vectors as a hand gesture that analyzes the temporal and contextual information of the gesture sequence. Thus, we proposed a hybrid deep learning framework to recognize dynamic hand gestures. In the Hybrid model GoogleNet is pipelined with a Bidirectional GRU unit to recognize the dynamic hand gesture. Dynamic hand gestures consist of many frames, and features of each frame need to be extracted to get the temporal and dynamic information of the performed gesture. As RNN takes input as a sequence of feature vectors, we extract features from videos using pretrained GoogleNet. As Gated Recurrent Unit is one of the variants of RNN to classify the sequential data, we created a feature vector that corresponds to each video and passed it to the bidirectional GRU (BGRU) network to classify the gestures. We evaluate our model on four publicly available hand gesture datasets. The proposed method performs well and is comparable with the existing methods. For instance, we achieved 98.6% accuracy on Northwestern University Hand Gesture(NWUHG), 99.6% on SKIG, 99.4% on Cambridge Hand Gesture (CHG) datasets respectively. We performed our experiments on DHG14/28 dataset and achieved an accuracy of 97.8% with 14-gesture classes and 92.1% on 28-gesture classes. DHG14/28 dataset contains skeleton and depth data, and our proposed model used depth data and achieved comparable accuracy.

Full Text