Abstract

Exploiting the relevant speech information that is embedded in facial images has been a significant research topic in recent years, because it has provided complementary information to acoustic signals for a wide range of automatic speech recognition (ASR) tasks. Visual information is particularly important in many real applications where acoustic signals are corrupted by environmental noises. This chapter reviews the most recent advances in feature extraction and representation for Visual Speech Recognition (VSR). In comparison with other surveys published in the past decade, this chapter presents a more up-to-date survey and highlights the strengths of two newly developed approaches (i.e., graph-based learning and deep learning) for VSR. In particular, we summarise the methods of using these two techniques to overcome one of the most challenging difficulties in this area-that is, how to automatically learn good visual feature representations from facial images to replace the widely used handcrafted features. This chapter concludes by discussing potential visual feature representation solutions that may overcome the remaining challenges in this domain.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call