Hand gestures are an essential part of human-to-human communication and interaction and, therefore, of technical applications. The aim is increasingly to achieve interaction between humans and computers that is as natural as possible, for example, by means of natural language or hand gestures. In the context of human-machine interaction research, these methods are consequently being explored more and more. However, the realization of natural communication between humans and computers is a major challenge. In the field of hand gesture recognition, research approaches are being pursued that use additional hardware, such as special gloves, to classify gestures with high accuracy. Recently, deep learning techniques using artificial neural networks have been increasingly proposed for the problem of gesture recognition without using such tools. In this context, we explore the approach of convolutional neural network (CNN) in detail for the task of hand gesture recognition. CNN is a deep neural network that can be used in the fields of visual object processing and classification. The goal of this work is to recognize ten types of static hand gestures in front of complex backgrounds and different hand sizes based on raw images without the use of extra hardware. We achieved good results with a CNN network architecture consisting of seven layers. Through data augmentation and skin segmentation, a significant increase in the model’s accuracy was achieved. On public benchmarks, two challenging datasets have been classified almost perfectly, with testing accuracies of 96.5% and 96.57%.
Read full abstract