Abstract

A novel multimodal fusion approach is proposed for Chinese sign language (CSL) recognition. This framework, the LSTM2+CHMM model, uses dual long short-term memory (LSTM) and a couple hidden Markov model (CHMM) to fuse hand and skeleton sequence information. Novel contributions, first, include a unique hand segmentation algorithm using power rate transforms and the RGB-D image fusion. This approach effectively overcomes common limitations, such as complex backgrounds, inconsistent lighting, and variable skin tones. Then, as a result, the proposed skeleton-hand fusion framework can be used for the vision-based sign language recognition (SLR) of non-specific people in non-specific environments. Finally, this LSTM2+CHMM model combines the probability theory with a neural network to provide a unified methodology for multiple-sequence fusion. The proposed SLR framework was tested using the two CSL datasets, and the experimental results showed it to be effective.

Highlights

  • Vision-based sign language recognition (SLR) is currently an active area of research in the field of artificial intelligence [1]–[18]

  • 2) RESULTS AND ANALYSIS The proposed Chinese sign language (CSL) recognition model was compared with conventional SLR algorithms, including Gaussian mixture-hidden Markov model (GMM-HMM) [27], adaptive HMM [22], 3D-convolutional neural network (CNN) [5], improved dense trajectories (iDTs)+support vector machine (SVM) [48], and space-time interest point (STIP)+SVM [47] using the 2nd CSL dataset

  • Tab. 4 indicates that recognition algorithms based on a neural network generally perform better than HMM-based methods, such as 3DCNN or LSTM2+couple hidden Markov model (CHMM), with maximum accuracies of 79.33% and 82.55%, respectively

Read more

Summary

Introduction

Vision-based sign language recognition (SLR) is currently an active area of research in the field of artificial intelligence [1]–[18]. SLR is challenging because critical technologies needed for high accuracy identification, such as humancomputer interfacing, are still being developed. Existing techniques are often designed for specific people or environments, limiting their robustness. There is a need for precise SLR with non-specific conditions. SLR involves multiple complex problems, such as humancomputer interactions and pattern recognition, which have attracted the attention of experts in multiple fields [18], [19]. Other challenges include variations in data collection and interpretation, such as subtle changes in gestures between individual people that make it difficult to establish a uniform SLR model [19].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call