DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition

Saleh Aly,Walaa Aly

doi:10.1109/access.2020.2990699

Abstract

Hand gesture recognition has attracted the attention of many researchers due to its wide applications in robotics, games, virtual reality, sign language and human-computer interaction. Sign language is a structured form of hand gestures and the most effective communication way among hear-impaired people. Developing an efficient sign language recognition system to recognize dynamic isolated gestures encounters three major challenges, namely, hand segmentation, hand shape feature representation and gesture sequence recognition. Traditional sign language recognition methods utilize color-based hand segmentation algorithms to segment hands, hand-crafted feature extraction for hand shape representation and Hidden Markov Model (HMM) for sequence recognition. In this paper, a novel framework is proposed for signer-independent sign language recognition using multiple deep learning architectures comprising hand semantic segmentation, hand shape feature representation and deep recurrent neural network. The recently developed semantic segmentation method called DeepLabv3+ is trained using a set of pixel-labeled hand images to extract hand regions from each frame of the input video. Then, the extracted hand regions are cropped and scaled to a fixed size to alleviate hand scale variations. Extracting hand shape features is achieved using a single layer Convolutional Self-Organizing Map (CSOM) instead of relying on transfer learning of pre-trained deep convolutional neural networks. The sequence of extracted feature vectors are then recognized using deep Bi-directional Long Short-Term Memory (BiLSTM) recurrent neural network. BiLSTM network contains three BiLSTM layers, one fully connected and softmax layers. The performance of the proposed method is evaluated using a challenging Arabic sign language database containing 23 isolated words captured from three different users. Experimental results show that the performance of proposed framework outperforms with large margin the state-of-the-art methods for signer-independent testing strategy.

Highlights

Hand gestures are commonly used among people to convey their thoughts and feelings [1]
It can be inferred from the results that using DeepLabv3+ semantic segmentation module significantly increases the performance by 70%
This paper proposes a new framework for signer-independent isolated Arabic sign language recognition based on the combination of DeepLabv3+ semantic segmentation, single layer convolutional SOM and Bi-directional long short-term memory network

Summary

INTRODUCTION

Hand gestures are commonly used among people to convey their thoughts and feelings [1]. Signer-independent dynamic sign language recognition systems encounter three challenges: (1) hand segmentation/detection, (2) hand shape feature representation, and (3) sequence classification. Hand segmentation is an essential step to find the gesture region-of-interest and to build an efficient signerindependent sign language recognition system [4]. Various deep convolutional neural networks are developed to tackle sign language recognition problem [8]–[12]. We propose an Arabic sign language recognition framework using a combination of three different deep learning architectures. The contributions in this paper are as follows: 1) A new framework is developed for signer-independent isolated Arabic sign language gesture recognition based on the combination of semantic segmentation network, convolutional SOM and deep Bi-directional LSTM network.

RELATED WORKS

EXPERIMENTAL RESULTS

CONCLUSION