Abstract
Hand gesture recognition is an attractive research field with a wide range of applications, including video games and telesurgery techniques. Another important application of hand gesture recognition is the translation of sign language, which is a complicated structured form of hand gestures. In sign language, the fingers' configuration, the hand's orientation, and the hand's relative position to the body are the primitives of structured expressions. The importance of hand gesture recognition has increased due to the prevalence of touchless applications and the rapid growth of the hearing-impaired population. However, developing an efficient recognition system needs to overcome the challenges of hand segmentation, local hand shape representation, global body configuration representation, and gesture sequence modeling. In this paper, a novel system is proposed for dynamic hand gesture recognition using multiple deep learning architectures for hand segmentation, local and global feature representations, and sequence feature globalization and recognition. The proposed system is evaluated on a very challenging dataset, which consists of 40 dynamic hand gestures performed by 40 subjects in an uncontrolled environment. The results show that the proposed system outperforms state-of-the-art approaches, demonstrating its effectiveness.
Highlights
Hand gesture recognition is the first step for a computer to understand human body language
Our experimental results showed that using temporal modeling enhancement can improve the performance of the 3D convolutional neural network (3DCNN) model
This study proposed a novel system for dynamic hand gesture recognition via a combination of multiple deep learning techniques
Summary
Hand gesture recognition is the first step for a computer to understand human body language. It plays a pivotal role in a wide range of human-computer interaction (HCI) applications such as smart TV control, video games, telesurgery, and virtual reality [1]. The hand gestures involved in sign language are structured in a very complex way as they convey important human communication information and feelings. The primitives of these manual expressions are the global configuration (the hand’s orientation and its relative position to the body) and the local fingers’ configuration. The time dependence of these frames makes it difficult to directly compare the primitives in Euclidean space
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.