Sign language (SL) recognition is intended to connect deaf people with the general population via a variety of perspectives, experiences, and skills that serve as a basis for the development of human-computer interaction. Hand gesture-based SL recognition encompasses a wide range of human capabilities and perspectives. The efficiency of hand gesture performance is still challenging due to the complexity of varying levels of illumination, diversity, multiple aspects, self-identifying parts, different shapes, sizes, and complex backgrounds. In this context, we present an American Sign Language alphabet recognition system that translates sign gestures into text and creates a meaningful sentence from continuously performed gestures. We propose a segmentation technique for hand gestures and present a convolutional neural network (CNN) based on the fusion of features. The input image is captured directly from a video via a low-cost device such as a webcam and is pre-processed by a filtering and segmentation technique, for example the Otsu method. Following this, a CNN is used to extract the features, which are then fused in a fully connected layer. To classify and recognize the sign gestures, a well-known classifier such as Softmax is used. A dataset is proposed for this work that contains only static images of hand gestures, which were collected in a laboratory environment. An analysis of the results shows that our proposed system achieves better recognition accuracy than other state-of-the-art systems.
Read full abstract