Abstract

Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

Highlights

  • Gestures are primary tool of symbolic communication and natural form in which humans express themselves more effectively

  • Tab. 2 shows a comparison of the proposed 3D Convolutional Neural Network (3D-CNN) + Long Short-Term Memory (LSTM) model with other models in terms of accuracy, precision and recall using the 20BN-jester dataset for 15 classes

  • L2 batch normalization was introduced to MobilNetV2+LSTM model and the accuracy improved to 87%, which was better but not acceptable as compared to other techniques proposed in the literature

Read more

Summary

Introduction

Gestures are primary tool of symbolic communication and natural form in which humans express themselves more effectively. They vary from simple to more complex actions which allow us to communicate with others. CMC, 2022, vol., no.3 most flexible body part of a human body is hand, hand gestures can express rich and various form of communication between humans and machines. They are widely used for communication between humans and computers or other electronic devices such as smart phones, robotics, auto-mobile infotainment system, etc. Gesture recognition can replace human-computer interaction from touch or wired-controlled input devices [1]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.