Abstract

Sign language recognition (SLR) is a useful tool for the deaf-mute to communicate with the outside world. Although many SLR methods have been proposed and have demonstrated good performance, continuous SLR (CSLR) is still challenging. Meanwhile, due to the heavy occlusions and closely interacting motions, there is a higher requirement for the real-time efficiency of CSLR. Therefore, the performance of CSLR needs further improvement. The highlights include: (1) to overcome these challenges, this paper proposes a novel video-based CSLR framework. This framework consists of three components: an OpenPose-based skeleton stream extraction module, a RGB stream extraction module, and a combination module of the BiLSTM network and the conditional hidden Markov model (CHMM) for CSLR. (2) A new residual network with Squeeze-and-Excitation blocks (SEResNet50) for video sequence feature extraction. (3) This paper combines the SEResNet50 module with the BiLSTM network to extract the feature information from video streams with different modalities. To evaluate the effectiveness of our proposed framework, experiments are conducted on two CSL datasets. The experimental results indicate that our method is superior to the methods in the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call