Automatic lip-reading model using 3D-CNN &amp; LSTM

Leela Prasad Kaki,Tarun V Sai,Devi P Sailaja,Lohit P Chandan,Sai P Yeshwanth

doi:10.26634/jse.18.3.20576

Automatic lip-reading model using 3D-CNN & LSTM

Leela Prasad Kaki, Tarun V Sai + Show 3 more

https://doi.org/10.26634/jse.18.3.20576

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Automatic lip-reading, the process of decoding spoken language through visual analysis of lip movements, presents a promising avenue for advancing human-computer interaction and accessibility. This research proposes an innovative model integrating 3D Convolutional Neural Networks (3D-CNN) and Long Short-Term Memory (LSTM) networks to enhance the accuracy and efficiency of lip-reading systems. The model addresses challenges related to lighting variations, speaker articulation, and linguistic diversity. This contrasts with traditional 2D-CNN, which focuses solely on spatial information, often missing temporal intricacies vital for accurate lip-reading. By incorporating 3D-CNN alongside LSTM, the proposed model significantly enhances recognition accuracy, offering a more comprehensive understanding of speech nuances. Extensive training on a diverse dataset and the exploration of transfer learning techniques contribute to the robustness and generalization of the model.

Full Text