Abstract

Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call