A comprehensive survey on recent advances and challenges in sign language recognition systems
A comprehensive survey on recent advances and challenges in sign language recognition systems
- Research Article
33
- 10.1007/s41870-020-00518-5
- Oct 19, 2020
- International Journal of Information Technology
According to the World Health Organization (WHO), 466 million people are suffering from hearing loss, i.e., 5% of the world population, of which 432 million (93%) are adults and 34 million (17%) children. The main problem is how deaf and hearing-impaired communicate with people and each other, how they get education or do their daily activities. Sign language is the main communication method for them. Building automatic hand gestures recognition system has many challenges specially in Arabic. Solving recognition problem and practically develop real-time recognition system is another challenge. Several types of research have been conducted on sign language recognition systems but for Arabic Sign Language are very limited. In this paper, an Arabic Sign Language (ArSL) recognition system that uses a Leap Motion Controller and Latte Panda is introduced. The recognition phase depends on two machine learning algorithms: (a) KNN (k-Nearest Neighbor) and (b) SVM (Support Vector Machine). Afterwards, an Ada-Boosting technique is applied to enhance the accuracy of both algorithms. A direct matching technique, DTW (Dynamic Time Wrapping), is applied and compared with AdaBoost. The proposed system is applied on 30 hand gestures which are composed of 20 single-hand gestures and 10 double-hand gestures. The experimental results show that the DTW achieved an accuracy of 88% for single-hand gestures and 86% for double-hand gestures. Overall, the proposed model’s recognition rate reached 92.3% for single-hand gestures and 93% for double-hand gestures after applying the Ada-Boosting. Finally, a prototype of our model was implemented in a single board (Latte Panda) to increase the system’s reliability and mobility.
- Research Article
18
- 10.3390/sym13020262
- Feb 4, 2021
- Symmetry
Sign language is a type of language for the hearing impaired that people in the general public commonly do not understand. A sign language recognition system, therefore, represents an intermediary between the two sides. As a communication tool, a multi-stroke Thai finger-spelling sign language (TFSL) recognition system featuring deep learning was developed in this study. This research uses a vision-based technique on a complex background with semantic segmentation performed with dilated convolution for hand segmentation, hand strokes separated using optical flow, and learning feature and classification done with convolution neural network (CNN). We then compared the five CNN structures that define the formats. The first format was used to set the number of filters to 64 and the size of the filter to 3 × 3 with 7 layers; the second format used 128 filters, each filter 3 × 3 in size with 7 layers; the third format used the number of filters in ascending order with 7 layers, all of which had an equal 3 × 3 filter size; the fourth format determined the number of filters in ascending order and the size of the filter based on a small size with 7 layers; the final format was a structure based on AlexNet. As a result, the average accuracy was 88.83%, 87.97%, 89.91%, 90.43%, and 92.03%, respectively. We implemented the CNN structure based on AlexNet to create models for multi-stroke TFSL recognition systems. The experiment was performed using an isolated video of 42 Thai alphabets, which are divided into three categories consisting of one stroke, two strokes, and three strokes. The results presented an 88.00% average accuracy for one stroke, 85.42% for two strokes, and 75.00% for three strokes.
- Conference Article
4
- 10.1117/12.382901
- Apr 14, 2000
An optical modeless Sign Language Recognition (SLR) system is presented. The system uses the HAusdorf-Voronoi NETwork (HAVNET), an artificial neural network designed for 2D binary pattern recognition. It uses adaptation of the Hausdorff distance to determine the similarity between an input pattern and a learned representation. A detailed review of the architecture, the learning equations, and the recognition equations for the HAVNET network are presented. Competitive learning has been implemented in training the network using a nearest-neighbor technique. The SLR system is applied to the optical recognition of 24 static symbols from the American Sign Language convention. The SLR system represents the target images in a 80 X 80 pixel format. The implemented HAVNET network classifies the inputs into categories representing each of the symbols, using an output layer of 24 nodes. The network is trained with 5 different formats for each symbol and is tested with all 24 symbols in 15 new formats. Results from the SLR system without competitive training show shape identification problems, when distinguishing symbols with similar shapes. Implementation of competitive learning in the HAVNET neural network improved recognition accuracy on this task to 89%. The hand gestures are identified through a window search algorithm. Feature recognition is obtained from edge enhancement by applying a Laplacian filter and thresholding, which provides robustness to pose, color and background variations.
- Research Article
82
- 10.1016/j.ins.2017.10.046
- Oct 27, 2017
- Information Sciences
Independent Bayesian classifier combination based sign language recognition using facial expression
- Book Chapter
3
- 10.1007/978-981-33-6984-9_30
- Jan 1, 2021
Sign language recognition systems help the hearing and vocally impaired people to communicate with the verbally speaking community. This paper proposes an Indian sign language (ISL) recognition system capable of recognizing isolated double-handed dynamic gestures involving facial expressions. The recognition system uses skin color segmentation for segmenting face and hand regions from the video frames. Histogram of oriented gradients (HOG) features are computed over the segmented frames, and Multi-class neural network is used for classification. The proposed system can recognize ten single and double-handed dynamic gestures with an accuracy of 81%. The most important thing regarding this work is that it does not use any sophisticated sensors or even does not use any computationally intensive methods like hand tracking and still achieve a remarkable accuracy.
- Research Article
12
- 10.5121/ijcsa.2014.4207
- Apr 30, 2014
- International Journal on Computational Science & Applications
Sign Language is a meansof communication between audibly challenged people. To provide an interface between the audibly challenged community and the rest of the world we need Sign Language translators. A sign language recognition system computerizes the work of a sign language trEvery Sign Language Recognition (SLR) System is trained to recognizespecific sets of signs and they correspondingly output the sign in the required format. These SLR systems are built with powerful image processing techniques. The sign language recognition systems are capable of recognizing a specific set of signing gestures and output the corresponding text/audio. Most of these systems involve the techniques of detection, segmentation, tracking, gesture recognition and classification. This paperproposes a design for a SLR System.
- Research Article
- 10.65521/ijacect.v14i1.201
- Apr 14, 2025
- International Journal on Advanced Computer Engineering and Communication Technology
Sign language is a vital way for people with hearing impairments to communicate, but unfortunately, many of us don't know how to use it. That's where technology comes in! Sign language recognition systems use artificial intelligence and computer vision to translate sign gestures into text or speech. Sign language recognition (SLR) systems help by using artificial intelligence (AI) and computer vision to convert sign gestures into text or speech. This study proposes a convolutional neural network (CNN)-based SLR model for recognizing numeric gestures in sign language. The proposed model is trained on a digit and alphabet-based dataset to ensure accurate classification of hand gestures. In this study, we developed a model which based on deep learning and recognize the hand gestures perfectly. Our experimental results show that the proposed Sequential model and Dense201 model on pre-trained dataset. The sequential model achieved accuracy 95.43 while dense model performed better and show accuracy 99.41, to perfectly recognize the gestures. These results show that our approach is highly effective in correctly identifying sign language gestures.
- Research Article
55
- 10.1080/02564602.2014.961576
- Sep 3, 2014
- IETE Technical Review
ABSTRACTSign language is a communication tool for deaf and dumb people that includes known signs or body gestures to transfer meanings. It uses shapes, directions, movements of hands, and also facial expressions. A sign not only transmits a word but also conveys a tone. Many of the deaf people are not only able to speak, but also not able to write or read a language, so developing sign language translation or in other words sign language recognition (SLR) system can be very vital in their life. The SLR is extremely desired because of its capability to overcome the obstacles between deaf and normal people. It is one of the most important research fields in the human computer interaction studies. Hence, this paper presents an overview of the recent main research works with the vision-based SLR system, and the existing recognition techniques are discussed. Next, we focus on video-based SLR system and perform continuous SLR within video sequences.
- Research Article
5
- 10.4314/njtd.v19i3.2
- Sep 23, 2022
- Nigerian Journal of Technological Development
Sign language is used by people who have hearing and speaking difficulties, but not understood by many without these difficulties. Therefore, sign language recognition systems are developed to aid communication between hearing impaired people and others. This paper developed a static American Sign Language Recognition (ASLR) system using canny-edge and histogram of oriented gradient (HOG) for feature extraction with K-Nearest Neighbour (K-NN) as classifier. The sign language image datasets used consist of English alphabets from both Massey University and Kaggle, and numbers (0-9) from Massey University. Median filter was used to remove noise after images were converted to grayscale. Otsu algorithm was used for segmentation while edges in the images were preserved using canny edge detection technique with HOG parameters tuning to obtain feature vectors. The extracted features were used by K-NN for classification. An average recognition accuracy and computational testing time of 97.6% and 0.39s respectively were obtained based on experiments with the Massey University dataset. Similarly, an average recognition accuracy and computational testing time of 99.0% and 0.43s respectively were obtained based on experiments with the Kaggle dataset. The developed system successfully recognized static English alphabets and numbers and outperformed some existing systems.
- Conference Article
3
- 10.1109/icaccs54159.2022.9785062
- Mar 25, 2022
Differently abled people who cannot speak or hear often face difficulties when communicating with others, due to unavailability of translators, and sign languages not being common knowledge. Several research works report Sign Language Recognition system leveraging deep learning algorithms. However, till date, a well-performing, highly generalizable, and cost-effective system is not available for commercial use. In this work, classification of 50 signs from the Indian Sign Language (ISL) is carried out using convolutional neural network on time-series data. Data for SLR is captured using wireless multi-modality wearable sensors. A novel deep transfer learning algorithm is proposed for personalization of the sign language recognition (SLR) system for a new user. The performance of the proposed approach is tested on 5 new users. The proposed model yields the best average accuracy of 95.6% after applying transfer-learning based personalization of the network with six samples of each sign from the new user. This is significantly better as compared to the 3% accuracy obtained when the model is tested without applying transfer learning, proving the effectiveness of proposed approach in handling subject variability.
- Research Article
- 10.37547/ajast/volume05issue07-11
- Jul 1, 2025
- American Journal of Applied Science and Technology
This article is dedicated to the development, technological foundations, and practical applications of modern Sign Language Recognition (SLR) systems. Advanced vision-based systems—particularly architectures such as MediaPipe Holistic, OpenPose, SignAll, Sign Language Transformer, and RWTH-PHOENIX—are analyzed in terms of their algorithmic principles, advantages, and limitations. These systems, based on artificial intelligence and deep learning architectures, enable the spatial-temporal, multimodal, and contextual recognition of sign language glosses. The MediaPipe system provides real-time detection of facial, body, and hand movements, while OpenPose excels at modeling the user’s body pose in 2D and 3D formats. The SignAll system integrates NLP components for translating sign language glosses. SLR systems based on the PHOENIX14T corpus, developed by RWTH Aachen University, are considered a benchmark for sign segmentation. In particular, the Transformer-based Sign Language Transformer model allows for seamless translation of sign language glosses into English text. The article thoroughly addresses issues such as multimodal signal analysis (gesture, pose, facial expression) for more accurate interpretation of sign movements, the creation of a contextual semantic representation model, real-time processing, and platform integration. Additionally, the practical significance of modern SLR systems in education, communication, and human-computer interaction (HCI) is analyzed.
- Research Article
3
- 10.3233/thc-192000
- Jan 1, 2020
- Technology and Health Care
For a traditional vision-based static sign language recognition (SLR) system, arm segmentation is a major factor restricting the accuracy of SLR. To achieve accurate arm segmentation for different bent arm shapes, we designed a segmentation method for a static SLR system based on image processing and combined it with morphological reconstruction. First, skin segmentation was performed using YCbCr color space to extract the skin-like region from a complex background. Then, the area operator and the location of the mass center were used to remove skin-like regions and obtain the valid hand-arm region. Subsequently, the transverse distance was calculated to distinguish different bent arm shapes. The proposed segmentation method then extracted the hand region from different types of hand-arm images. Finally, the geometric features of the spatial domain were extracted and the sign language image was identified using a support vector machine (SVM) model. Experiments were conducted to determine the feasibility of the method and compare its performance with that of neural network and Euclidean distance matching methods. The results demonstrate that the proposed method can effectively segment skin-like regions from complex backgrounds as well as different bent arm shapes, thereby improving the recognition rate of the SLR system.
- Conference Article
58
- 10.1109/smc.2016.7844675
- Jan 27, 2016
Sign Language Recognition (SLR) system is a novel method that allows hard of hearing people to communicate with society. In this study, an American Sign Language (ASL) recognition system was proposed by using the surface Electromyography (sEMG). The objective of this study is to recognize the American Sign Language alphabet letters and allow users to spell words and sentences. For this purpose, sEMG signals are acquired from subject's right forearm for 27 American Sign Language gestures, 26 English alphabet letters, and one for home position. Time domain, frequency domain (band power), power spectral density (band power), and average power features were used as the feature extraction methods. After feature extraction, Principal Component Analysis (PCA) was applied to obtain uncorrelated features. As a classification method, Support Vector Machine and Ensemble Learning algorithm were used and their performances were compared with tabulated results. In conclusion, the results of this study show that sEMG signal can be used for SLR systems.
- Research Article
- 10.55640/eijmrms-05-07-04
- Jul 1, 2025
- European International Journal of Multidisciplinary Research and Management Studies
This article is dedicated to analyzing advanced approaches in temporal modeling and real-time gesture recognition within sign language recognition (SLR) systems. Sign glosses are expressed through the spatio-temporal characteristics of visual information, which requires the use of sequence-processing models for their automatic recognition. The study primarily evaluates the effectiveness of three key models: Long Short-Term Memory (LSTM) networks, Temporal Convolutional Networks (TCN), and Transformer-based architectures. The article also examines methods applied for real-time analysis of sign glosses, including: Sliding window segmentation of video streams; Self-attention mechanisms for identifying dependencies between gestures; Gloss mapping algorithms for linking sign movements to linguistic units; Ontological integration techniques for enhancing semantic accuracy. Practical results indicate that combining temporal modeling with semantic analysis and contextual verification algorithms ensures continuous and high-accuracy recognition of sign movements. In particular, multimodal systems (video + sensor + gloss) utilizing Transformer-based approaches achieved superior performance in real-time conversion of continuous sign gloss streams into text. The findings of this study hold practical significance for the development of smart assistive devices for automatic sign language translation, interactive interfaces for hearing-impaired users, and specialized SLR platforms for educational and instructional purposes.
- Conference Article
3
- 10.1109/icac347590.2019.9036832
- Dec 1, 2019
Sign language recognition (SLR) system is helpful for individuals who can't talk or hear. One can easily communicate with this kind of people using SLR system. In this paper we are focusing on SLR approach where we have apply our proposed algorithm to recognize hand gestures of routine words like Morning, Night, Monday, Tuesday and so forth. Total 10 routine words we have recognize using our proposed technique. For recognize hand pose we have utilized skin color detection algorithm. Further we have applied correlation-coefficient algorithm to distinguish closeness features which are further continue in Neuro-fuzzy(NF) classification algorithm to recognize words. Presented algorithm tried on Matlab. With taken of 10 routine words our proposed framework of techniques got 92% correction rate.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.