Abstract

Lip reading is a task to infer the content of speech through the movement of the lips, which is a technology with wide application prospects. Thanks to the development of deep learning technology and the proposal of large-scale datasets, lip reading has made great progress. However, these large-scale datasets contain only small angle lip deflection which is different with the wild environment. Most networks based on these datasets take some simple methods to correct the lip deflection that led to performance degradation when in the wild. In this work, we proposed a very challenging dataset with large angle lip deflection and multiple speakers to simulate the wild environment, and we designed a lip deflection classifier based on Convolutional Neural Network and a two-stage corrector based on Generative Adversarial Network. We use these two models to correct the lip deflection and improve the recognition accuracy. Our proposed method achieves an absolute improvement of 18.3% and 7.4%, respectively, compared with no preprocessing and just face alignment, which shows that our proposed network is more suitable to correct the large angle lip deflection in the wild.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call