Abstract

Sign language recognition is a fundamental technique to improve communication between native signers and speakers. Current state-of-the-art sign language recognition methods often apply deep neural network models to learn an optimized projection between sign language videos and sentences in an end-to-end manner. Generally, minibatch training using sequential data requires the addition of padding to equalize the varying lengths of sequences. However, this training strategy induces performance degradation if batch normalization is used in the models because batch normalization assumes the validity of all inputs. In this study, we propose masked batch normalization, which normalizes input features while masking dummy signals. We apply masked batch normalization to tracking-based sign language recognition models using graph convolutional networks. The performance of the proposed method is evaluated in isolated sign language word recognition and continuous sign language words recognition settings. To evaluate the proposed method, we use two types of sign language video datasets, WLASL including 2000 types of isolated words, and a JSL dataset including 275 types of videos of isolated words and 113 types of videos showing sentences. The evaluation results show that the proposed method improves the tracking-based sign language recognition models in both cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call