Abstract

The sign language signal has hierarchically related information over short and long distances. Due to the intricate temporal correlation of input sequences, Chinese sign language recognition (SLR) has a modeling challenge. The conventional encoders based on recurrent networks cannot discover and leverage the hierarchical structure of sign language well. In this paper, we propose a novel encoder-decoder method based on boundary adaptive learning for Chinese SLR. The hierarchical structure of sign language signal can be encoded by the boundary-adaptive encoder (BAE) in the proposed method. In order to improve efficiency in modeling long sign language sequences, the window attention model based on location is utilized in the decoding phase, which can generate more effective weight coefficients. Besides, we use sign language subword units to realize both isolated and continuous Chinese SLR in the same sequence learning framework in our method. Theoretical analysis and experimental results demonstrate the effectiveness and superiority of the proposed method.

Highlights

  • Sign language is the most important way to communicate with deaf-mute people and sign language recognition (SLR) is a task dedicated to advancing this communication process with the help of computer technology

  • Because of the strong ability in feature extraction, methods based on convolutional neural networks (CNN) or 3D-CNN are widely used in SLR [2], [7], [9]

  • A series of methods based on recurrent neural networks (RNN) or long short-term memory (LSTM) for sequence processing are applied in isolated SLR [9], [11], [12], [52]

Read more

Summary

Introduction

Sign language is the most important way to communicate with deaf-mute people and sign language recognition (SLR) is a task dedicated to advancing this communication process with the help of computer technology. The recognition targets of the former are isolated sign words, and of the latter are continuous sign sentences. Isolated SLR, the earliest research task of SLR, draws on many ideas in feature extraction and temporal modeling from action recognition [4]. Because of the strong ability in feature extraction, methods based on convolutional neural networks (CNN) or 3D-CNN are widely used in SLR [2], [7], [9]. A series of methods based on recurrent neural networks (RNN) or long short-term memory (LSTM) for sequence processing are applied in isolated SLR [9], [11], [12], [52].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call