Abstract

Continuous sign language recognition is challenging due to coarticulatory distortions, which occur at the beginning and end of each gesture. These distortions depend on the temporal context and introduce additional intraclass variability. To address this issue, a new approach is proposed that extracts segments from the image sequence corresponding to undistorted parts of gestures. This should simplify the task by reducing it to the easier problem of isolated gestures recognition. The proposed approach uses deep reinforcement learning for segmentation and a novel image sequence processing scheme to extract gradient changes over time. A dataset recorded by deaf people and annotated according to the proposed approach, was prepared to evaluate the method. The proposed deep learning architectures achieved leave-one-subject-out recognition accuracies in the range of 0.70 to 0.76. Considering the inability to compare with other works, the authors also proposed other evaluation protocols to thoroughly examine the employed approach. This work will be developed, and the main aspiration of the authors will be to create an integrated framework that converts the raw form of RGB video into a string of words representing the Sign Language user’s intentions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call