Abstract

Sign language recognition (SLR) bridges the communication gap between the hearing-impaired and the ordinary people. However, existing SLR systems either cannot provide continuous recognition or suffer from low recognition accuracy due to the difficulty of sign segmentation and the insufficiency of capturing both finger and arm motions. The latest system, SignSpeaker, has a significant limit in recognizing two-handed signs with only <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">one</i> smartwatch. To address these problems, this paper designs a novel real-time end-to-end SLR system, called DeepSLR, to translate sign language into voices to help people “hear” sign language. Specifically, two armbands embedded with an IMU sensor and multi-channel sEMG sensors are attached on the forearms to capture both coarse-grained arm movements and fine-grained finger motions. We propose an attention-based encoder-decoder model with a multi-channel convolutional neural network (CNN) to realize accurate, scalable, and end-to-end continuous SLR without sign segmentation. We have implemented DeepSLR on a smartphone and evaluated its effectiveness through extensive evaluations. The average word error rate of continuous sentence recognition is 10.8 percent, and it takes less than 1.1s for detecting signals and recognizing a sentence with 4 sign words, validating the recognition efficiency and real-time ability of DeepSLR in real-world scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call