Training Strategies for Isolated Sign Language Recognition

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Training Strategies for Isolated Sign Language Recognition

Similar Papers
  • Book Chapter
  • 10.3233/atde240085
Isolated Sign Language Recognition Based on Deep Learning
  • Mar 21, 2024
  • Junjie Wang

Communicating with hearing-impaired individuals poses a significant challenge. However, with the advancement of computer vision, automatic sign language recognition (SLR) is gradually addressing this issue and has made significant improvements. One of the key challenges in SLR lies in accurately capturing and interpreting the subtle nuances and variations in sign language gestures. In this study, our focus is on recognizing isolated sign language using the LSA64 dataset, which is a small-scale dataset of Argentinian isolated sign language. We concatenated CNN and LSTM into an end-to-end sign language recognition model for an isolated sign language recognition dataset, recognition of the Argentine Sign Language dataset (LSA64). We achieved promising results in our study, obtaining a high accuracy rate of nearly 97% while ensuring that the model remained compact in size.

  • Research Article
  • 10.14419/mwb8ve64
Gesture Language Recognition Through ‎Computer Vision and a Spatial-Temporal ‎Mathematical Model
  • Jul 20, 2025
  • International Journal of Basic and Applied Sciences
  • S Manikandan + 5 more

For people with speech and hearing impairments, sign language is a vital form of communication that allows them to communicate and en-‎gage with others. Nonetheless, a major obstacle is the general public's limited comprehension of sign language. For the deaf and mute com-‎communities, this communication gap frequently results in challenges with social inclusion, education, and career prospects. To solve ‎this problem, researchers are increasingly using deep learning and artificial intelligence (AI) techniques to create automatic sign language ‎recognition (SLR) systems that can instantly translate sign motions into speech or text. This paper presents a hybrid method that combines ‎continuous sign language recognition (CSLR) and isolated sign language recognition (SLR) into a single deep learning framework. The ‎system uses a Spatial-Temporal Network (STNet) to identify dynamic sign sequences in CSLR and a Convolutional Neural Network ‎‎(CNN) for isolated sign identification. An ensemble learning technique is included to increase model robustness, and an optimized Inception-based architecture is utilized for isolated sign classification to boost performance. Additionally, a novel Spatial Resonance Module ‎‎(SRM) refines frame-to-frame feature extraction, and a Multi-Temporal Perception Module (MTPM) strengthens long-range dependency ‎recognition in sign sequences. These advancements contribute to higher accuracy and efficiency in sign language interpretation. Experimental validation of the proposed system was conducted using benchmark datasets, demonstrating superior performance compared to existing state-of-the-art techniques. The model achieved an accuracy of 98.46% in isolated sign recognition and exhibited a 2.9% improvement in ‎CSLR tasks. The ability to accurately recognize and translate sign language in both isolated and continuous contexts makes this system ‎highly suitable for real-time applications, including assistive communication devices, virtual interpreters, and educational tools. The pro-‎posed research has the potential to significantly impact accessibility and inclusivity for individuals with speech and hearing impairments. By ‎integrating deep learning with real-time processing, this system enhances human-computer interaction and fosters seamless communication ‎between sign language users and the broader community. Future research can explore the integration of additional modalities, such as facial ‎expressions and hand movement trajectories, to further refine sign language recognition models and ensure even greater accuracy and adapt-‎ability.

  • Research Article
  • 10.1038/s41597-025-04986-x
A large dataset covering the Chinese national sign language for dual-view isolated sign language recognition
  • Apr 19, 2025
  • Scientific Data
  • Peng Jin + 9 more

Isolated Sign Language Recognition (ISLR), which seeks to automatically align sign videos with corresponding glosses, has recently gained considerable attention from the artificial intelligence community. This technology has the potential to bridge the communication gap between hearing people and the deaf community. However, the development of ISLR is hindered by the scarcity of sign language datasets. Moreover, existing ISLR datasets are limited by their provision of a single perspective, which makes hand gesture occlusion difficult to handle. In addition, existing Chinese ISLR datasets, such as DEVISIGN and NMFs-CSL, fail to cover the entire vocabulary of Chinese National Sign Language (CNSL). This greatly obstructs the application of ISLR in the real world. To address these challenges, we introduce a novel word-level sign language dataset for ISLR that encompasses the entire CNSL vocabulary, comprising 6,707 unique signs. Moreover, it provides two perspectives of signers: the front side and the left side. There are ten signers involved in sign video recording, and the processes of sign video recording, annotation and quality assurance were rigorously controlled. To the best of our knowledge, this dataset is the first dual-view Chinese sign language dataset for ISLR that covers all the sign words in CNSL.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/skima47702.2019.8982452
Novel Technique for Isolated Sign Language Based on Fingerspelling Recognition
  • Aug 1, 2019
  • Ahmad Yahya Dawod + 1 more

Sign language is used by deaf and hard hearing people to exchange information between their own community and with other people. Fingerspelling recognition method from isolate sign language has attracted research interest in computer vision and human-computer interaction based on a novel technique. The essential for real-time recognition of isolate sign language has grown with the emergence of better-capturing devices such as Kinect sensors. The purpose of this paper is to design a user independent framework for automatic recognition of American Sign Language which can recognize several one-handed dynamic isolated signs and interpreting their meaning. We built datasets as a raw data for alphabets (A-Z) or numbers (1-20) by used left-hand the 3D point (X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L</sub> , Y <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L</sub> , Z <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L</sub> ) or switch by right-hand (X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</sub> , Y <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</sub> , Z <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</sub> ) centroid as one of contribution. The proposed approach was tested for gestures that involve left-hand or right-hand and was compared with other approach and gave better accuracy. Two machine learning methods are involved like Hidden Conditional Random Field (HCRF), and Random Decision Forest (RDF) for the classification part. The third contribution based on low lighting condition and cluttered background. In this research work is achieved for recognition accuracy over 99.7%.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/siu.2018.8404374
Isolated sign language recognition with fast hand descriptors
  • May 1, 2018
  • Ogulcan Özdemir + 2 more

Recognition of sign language, the main mode of communication of the hearing impaired, has attracted the attention of researchers working in the field of computer vision in recent years. In this study, we propose a fast alternative method to the Improved Dense Trajectories (IDT) method for sign language recognition. In our proposed method, Histogram of Oriented Gradients (HOG), Histogram of Optical Flow (HOF) and Motion Boundary Histograms (MBH) are obtained from the cropped hand regions. Then, Fisher Vectors (FV) are coded and used in classification with Linear Support Vector Machine (SVM) for each descriptor from each sign video. It has been shown that our method can achieve similar performance ten times faster than IDT.

  • Conference Article
  • Cite Count Icon 1
  • 10.52591/lxai202306186
Impact of Video Length Reduction due to Missing Landmarks on Sign Language Recognition Model
  • Jun 18, 2023
  • Carlos Vasquez + 1 more

Sign Language Processing (SLP) has become an increasingly challenging field, particularly in the areas of sign language recognition (SLR), translation, and production. One of the primary challenges in SLP is pose estimation, which can be impacted by missing landmarks due to occlusions or limitations in the model’s performance. In this study, we propose a method for evaluating the impact of missing landmarks on the performance of an SLR transformer-based model for the Isolated Sign Language Recognition (ISLR) task. We train and test the Spoter model on two subsets of Peruvian Sign Language datasets, and evaluate its performance using top-1 and top-5 validation accuracy. The study finds that removing frames with missing landmarks did not significantly impact accuracy in most of the cases, which suggests that additional preprocessing steps may not be necessary to deal with missing landmarks in this particular task. These findings contribute to the ongoing research in SLP and highlight potential avenues for improving SLP tasks.

  • Research Article
  • 10.59313/jsr-a.1367212
Deep learning-based isolated sign language recognition: a novel approach to tackling communication barriers for individuals with hearing impairments
  • Dec 31, 2023
  • Journal of Scientific Reports-A
  • Naciye Nur Arslan + 2 more

Sign language is a primary and widely used means of communication for individuals with hearing impairments. Current sign language recognition techniques need to be improved and need further development. In this research, we present a novel deep learning architecture for achieving significant advancements in sign language recognition by recognizing isolated signs. The study utilizes the Isolated Sign Language Recognition (ISLR) dataset from 21 hard-of-hearing participants. This dataset comprises 250 isolated signs and the x, y, and z coordinates of 543 hand gestures obtained using MediaPipe Holistic Solution. With approximately 100,000 videos, this dataset presents an essential opportunity for applying deep learning methods in sign language recognition. We present the comparative results of our experiments, where we explored different batch sizes, kernel sizes, frame sizes, and different convolutional layers. We achieve an accuracy rate of 83.32% on the test set.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/aim.2012.6266025
An encoding and identification approach for the static sign language recognition
  • Jul 1, 2012
  • Fu-Hua Chou + 1 more

Sign language identification and recognition technique is composed by the gesture images detection and the hand gestures recognition. Gesture images detection is to locate the image part of palm and fingers from the captured pictures, and rotating them to the appropriate gesture posture. Both of them are the important pre-processing for sign language identification and recognition. Lose them, the correctness rate of the sign language recognition algorithms will be dropped down to an unacceptable level. This paper presents novel processing algorithms for the gesture images detection and recognition. In the detection process, it rotates an askew gesture to right position, and to delete the elbow and forearm parts from the captured pictures. In the recognition process, it includes two phases with the model construction and the sign language identification. In the model construction phase, the static hand gesture of sign language is constructed by the Gaussian mixture model, and the unknown gesture image is identified by Gaussian model match. Based on this presented static sign language detection and recognition algorithms, the correct recognition rate is about 94% in average.

  • Research Article
  • Cite Count Icon 46
  • 10.1109/lsp.2018.2797228
A Novel Chinese Sign Language Recognition Method Based on Keyframe-Centered Clips
  • Mar 1, 2018
  • IEEE Signal Processing Letters
  • Shiliang Huang + 3 more

Isolated sign language recognition (SLR) is a long-standing research problem. The existing methods consider inclusively ambiguous data to represent a sign and ignore the fact that only scarce key information can represent the sign efficiently since most information are redundant. Furthermore, inclusion of redundant information may result in inefficiency and difficulty in modeling the long-term dependency for SLR. This letter delivers a novel sequence-to-sequence learning method based on keyframe centered clips (KCCs) for Chinese SLR. Different from conventional methods, only key information is considered to represent a sign significantly. The frames-to-word task is transformed into a KCCs-to-subwords task successfully, to allow for different attention in the input data. The empirical results of the proposed method outperform significantly the state-of-the-art SLR systems on our dataset containing 310 Chinese sign language words.

  • Research Article
  • 10.3233/jifs-223601
Multi-state feature optimization of sign glosses for continuous sign language recognition
  • Oct 4, 2023
  • Journal of Intelligent &amp; Fuzzy Systems
  • Tao Lin + 5 more

Vision-based Continuous Sign Language Recognition (CSLR) is a challenging and weakly supervised task aimed at segmenting sign language from weakly annotated image stream sequences for recognition. Compared with Isolated Sign Language Recognition (ISLR), the biggest challenge of this work is that the image stream sequences have ambiguous time boundaries. Recent CSLR works have shown that the visual-level sign language recognition task focuses on image stream feature extraction and feature alignment, and overfitting is the most critical problem in the CSLR training process. After investigating the advanced CSLR models in recent years, we have identified that the key to this study is the adequate training of the feature extractor. Therefore, this paper proposes a CSLR model with Multi-state Feature Optimization (MFO), which is based on Fully Convolutional Network (FCN) and Connectionist Temporal Classification (CTC). The MFO mechanism supervises the multiple states of each Sign Gloss in the modeling process and provides more refined labels for training the CTC decoder, which can effectively solve the overfitting problem caused by training, while also significantly reducing the training cost in time. We validate the MFO method on the popular CSLR dataset and demonstrate that the model has better performance.

  • Research Article
  • 10.7717/peerj-cs.2054
Isolated sign language recognition through integrating pose data and motion history images
  • May 21, 2024
  • PeerJ Computer Science
  • Ali Akdağ + 1 more

This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.

  • Research Article
  • 10.3390/electronics14234589
Unified Spatiotemporal Detection for Isolated Sign Language Recognition Using YOLO-Act
  • Nov 23, 2025
  • Electronics
  • Nada Alzahrani + 2 more

Isolated Sign Language Recognition (ISLR), which focuses on identifying individual signs from sign language videos, presents substantial challenges due to small and ambiguous hand regions, high visual similarity among signs, and large intra-class variability. This study investigates the adaptability of YOLO-Act, a unified spatiotemporal detection framework originally developed for generic action recognition in videos, when applied to large-scale sign language benchmarks. YOLO-Act jointly performs signer localization (identifying the person signing within a video) and action classification (determining which sign is performed) directly from RGB sequences, eliminating the need for pose estimation or handcrafted temporal cues. We evaluate the model on the WLASL2000 and MSASL1000 datasets for American Sign Language recognition, achieving Top-1 accuracies of 67.07% and 81.41%, respectively. The latter represents a 3.55% absolute improvement over the best-performing baseline without pose supervision. These results demonstrate the strong cross-domain generalization and robustness of YOLO-Act in complex multi-class recognition scenarios.

  • Research Article
  • 10.3390/app152111571
Isolated German Sign Language Recognition for Classifying Polar Answers Using Landmarks and Lightweight Transformers
  • Oct 29, 2025
  • Applied Sciences
  • Cristina Luna-Jiménez + 4 more

Sign Languages are the primary communication modality of deaf communities, yet building effective Isolated Sign Language Recognition (ISLR) systems remains difficult under data limitations. In this work, we curated a sub-dataset from the DGS-Korpus focused on recognizing affirmations and negations (polar answers) in German Sign Language (DGS). We designed lightweight transformer models using landmark-based inputs and evaluated them on two tasks: the binary classification of affirmations versus negations (binary semantic recognition) and the multi-class recognition of sign variations expressing positive or negative replies (multi-class gloss recognition). The main contribution of the article, hence, relies on the exploration of models for performing polar answer recognition in DGS and the exploration of differences between performing multi-class or binary class classification. Our best binary model achieved an accuracy of 97.71% using only hand landmarks without Positional Encoding, highlighting the potential of lightweight landmark-based transformers for efficient ISLR in constrained domains.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/jifs-230528
The use of thematic context-based deep learning in discourse expression of sports news
  • Nov 4, 2023
  • Journal of Intelligent &amp; Fuzzy Systems
  • Yefei Liu

Sports news is a type of discourse that is characterized by a specific vocabulary, style, and tone, and it is typically focused on conveying information about sporting events, athletes, and teams. Thematic context-based deep learning is a powerful approach that can be used to analyze and interpret various forms of natural language, including the discourse expression of sports news. An application model of sign language and lip language recognition based on deep learning is proposed to facilitate people with hearing impairment to easily obtain sports news content. First, the lip language recognition system is constructed; next, MobileNet lightweight network combined with Long-Short Term Memory (LSTM) is used to extract lip reading features. ResNet-50 residual network structure isadopted to extract the features of sign language; finally, the convergence, accuracy, precision and recall of the model are verified respectively. The results show that the loss of training set and test set converges gradually with the increase of iteration times; the lip language recognition model and the sign language recognition model basically tend to be stable after 14 iterations and 12 iterations, respectively, suggesting a better convergence effect of sign language recognition. The accuracy of sign language recognition and lip language recognition is 98.9% and 87.7%, respectively. In sign language recognition, the recognition accuracy of numbers 1, 2, 4, 6 and 8 can reach 100%. In lip language recognition, the recognition accuracy of numbers 2, 3 and 9 is relatively higher. This exploration can facilitate hearing-impaired people to quickly obtain the relevant content in sports news videos, and also provide help for their communication.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 117
  • 10.3390/s18103554
American Sign Language Recognition Using Leap Motion Controller with Machine Learning Approach.
  • Oct 19, 2018
  • Sensors
  • Teak-Wei Chong + 1 more

Sign language is intentionally designed to allow deaf and dumb communities to convey messages and to connect with society. Unfortunately, learning and practicing sign language is not common among society; hence, this study developed a sign language recognition prototype using the Leap Motion Controller (LMC). Many existing studies have proposed methods for incomplete sign language recognition, whereas this study aimed for full American Sign Language (ASL) recognition, which consists of 26 letters and 10 digits. Most of the ASL letters are static (no movement), but certain ASL letters are dynamic (they require certain movements). Thus, this study also aimed to extract features from finger and hand motions to differentiate between the static and dynamic gestures. The experimental results revealed that the sign language recognition rates for the 26 letters using a support vector machine (SVM) and a deep neural network (DNN) are 80.30% and 93.81%, respectively. Meanwhile, the recognition rates for a combination of 26 letters and 10 digits are slightly lower, approximately 72.79% for the SVM and 88.79% for the DNN. As a result, the sign language recognition system has great potential for reducing the gap between deaf and dumb communities and others. The proposed prototype could also serve as an interpreter for the deaf and dumb in everyday life in service sectors, such as at the bank or post office.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.