Abstract

Sign language (SL) recognition is intended to connect deaf people with the general population via a variety of perspectives, experiences, and skills that serve as a basis for the development of human-computer interaction. Hand gesture-based SL recognition encompasses a wide range of human capabilities and perspectives. The efficiency of hand gesture performance is still challenging due to the complexity of varying levels of illumination, diversity, multiple aspects, self-identifying parts, different shapes, sizes, and complex backgrounds. In this context, we present an American Sign Language alphabet recognition system that translates sign gestures into text and creates a meaningful sentence from continuously performed gestures. We propose a segmentation technique for hand gestures and present a convolutional neural network (CNN) based on the fusion of features. The input image is captured directly from a video via a low-cost device such as a webcam and is pre-processed by a filtering and segmentation technique, for example the Otsu method. Following this, a CNN is used to extract the features, which are then fused in a fully connected layer. To classify and recognize the sign gestures, a well-known classifier such as Softmax is used. A dataset is proposed for this work that contains only static images of hand gestures, which were collected in a laboratory environment. An analysis of the results shows that our proposed system achieves better recognition accuracy than other state-of-the-art systems.

Highlights

  • Sign language (SL) involves movements of different parts of the body, for example the face and hands, which deaf and hearing-impaired people use to interact with hearing people

  • Description of the Image Dataset In order to create an image dataset, data were taken from a live camera, and hand gesture images were obtained from the region of interest (ROI) and used as input

  • The training and test datasets contained hand gestures performed by the same individual, and different individuals performed real-time American Sign Language (ASL) alphabet recognition and sentence interpretation based on trained model

Read more

Summary

Introduction

Sign language (SL) involves movements of different parts of the body, for example the face and hands, which deaf and hearing-impaired people use to interact with hearing people. Hand gesture-based recognition systems for isolated sign words were proposed in [4], in which feature fusion techniques were used to detect the sign words. In [6], a hidden Markov and depth sensor device-based classification system was proposed to learn the sign gestures. Depth sensor-based ASL recognition was proposed [9] This system was developed using 26 characters and 10 numbers. A non-wearable device is used to collect input from 26 ASL alphabet gestures and to pre-process it using the proposed method. The rest of this paper is structured as follows: Section 2 briefly discusses the details of the image dataset, the pre-processing of input images, feature extraction, and the classification processes of the proposed system.

Proposed System
Image Pre-processing
Feature Extraction and Classification
Experimental Results and Analysis
Hand Gesture Segmentation
24 ASL 10 ASL
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call