Abstract

Aiming at the problems in temporal model during the research of lip reading, a deep learning model is proposed based on spatiotemporal convolutional neural networks (STCNN) and Convolutional Long Short-Term Memory (ConvLSTM). Firstly, STCNN is used to learn the features of the extracted lip image, and then the learned features are sent to ConvLSTM to process the time series data, which is classified by softmax, and finally the CTC loss function is used to optimize the results. Using GRID data set for training, comparing with experiments, it is found that the recognition accuracy of this model achieves 95.0% at the word level. Experiments show that the model can improve the accuracy of lip reading.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call