Abstract

Sign language recognition plays an important role in real-time sign language translation, communication for deaf people, education and human-computer interaction. However, vision-based sign language recognition faces difficulties such as insufficient data, huge network models and poor timeliness. We use VTN (Video Transformer Net) to construct a lightweight sign language translation network. We construct the dataset called CSL_BS (Chinese Sign Language-Bank and Station) and two-way VTN to train isolated sign language and compares it with I3D (Inflated three Dimension). Then I3D and VTN are respectively used as feature extraction modules to extract the features of continuous sign language sequences, which are used as the input of the continuous sign language translation decoding network (seq2seq). Based on CSL-BS, two-way VTN achieves 87.9% accuracy while two-way I3D is 84.2%. And the recognition speed is increased by 46.8%. In respect of continuous sign language translation, the accuracy of VTN_seq2seq is 73.5% while I3D_seq2seq is 71.2%, the recognition speed is 13.91s and 26.54s respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call