Abstract

The current field of lipreading is limited to the processing of visual signal and the optimization of sequence models, but the sentence text is ignored. Aiming at this problem, we proposed a lipreading method combined with natural language processing (NLP) technology, Lip-Corrector, which applies the BERT model in this paper. The front end of the model uses 3D+2D convolutional neural network (CNN) to extract lip information, the middle end uses the Transformer-based Seq2seq sequence model to make sentence-level predictions, and the back end uses a sentence correction method based on the BERT model, which connects to the midend after pre-training on the self-made dataset. Experiments on the two largest sentence-level lipreading datasets of LRS2 and LRS3 show that the performance of this model surpasses all the baselines, which proves that lipreading methods combined with NLP technology will get better results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call