Abstract

The robustness of automatic speech recognition (ASR) systems degrade due to the factors such as environmental noises, speaker variability, and channel distortion, among others. The approaches such as speech signal processing, model adaptation, hybrid techniques and integration of multiple sources are used for ASR system development. This paper focuses on building a robust ASR system by combining the complementary evidence present is the multiple modalities through which speech is expressed. Speech sounds are produced with lip radiation accompanied lip movements called Visual Speech Recognition (VSR). VSR system converts lip movement into spoken words. This system consists of lip region detection, visual speech feature extraction method and modeling techniques. Robust feature extraction from visual lip movement is a challenging task in VSR system. Hence, this paper reviews the feature extraction methods and existing databases used for VSR system. The fusion of visual lip movements with ASR system at different levels is also presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call