Abstract

Sign language recognition systems are used for enabling communication between deaf-mute people and normal user. Spatial localization of the hands could be a challenging task when hands-only occupies 10% of the entire image. This is overcome by designing a real-time efficient system that is capable of performing the task of extraction, recognition, and classification within a single network with the use of a deep convolution network. The recognition is performed for static image dataset with a simple and complex background, dynamic video dataset. Static image dataset is trained and tested using a 2D deep-convolution neural network whereas dynamic video dataset is trained and tested using a 3D deep-convolution neural network. Spatial augmentation is done to increase the number of images of static dataset and key-frame extraction to extract the key-frames from the videos for dynamic dataset. To improve the system performance and accuracy Batch-Normalization layer is added to the convolution network. The accuracy is nearly 99% for dataset with a simple background, 92% for dataset with complex background, and 84% for the video dataset. By obtaining a good accuracy, the system is proved to be real-time efficient in recognizing and interpreting the sign language gestures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.