Abstract

Research work on the design of robust multimodal speech recognition systems making use of acoustic and visual cues, extracted using the relatively noise robust alternate speech sensors is gaining interest in recent times among the speech processing research fraternity. The primary objective of this work is to study the exclusive influence of Lombard effect on Automatic Speech Recognition (ASR) systems towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature. The dataset for this work comprises the confusable 145 Consonant-Vowel (CV) syllabic units of Hindi language recorded simultaneously using three modalities that capture the acoustic and visual speech cues, namely Normal acoustic Microphone (NM), Throat Microphone (TM) and a camera that captures the associated lip movements. The Lombard effect is induced by feeding crowd noise into the speaker’s headphone while recording. HMM models are built to categorize the CV units based on their Place of Articulation (POA), Manner Of Articulation (MOA) and vowels (under clean and Lombard conditions). Unimodal ASR systems built using each speech cue show a recognition loss in all the systems due to Lombard effect. To overcome this loss, the complimentary speech cues taken from normal and throat microphone Lombard speech as well as from visual Lombard speech are used to build three bimodal and one trimodal ASR systems. Among the ASR systems studied, the trimodal system gives the best recognition accuracy of 98%, 95% and 76% for the vowels, MOA and POA, respectively, with an average improvement of 36% over the unimodal ASR systems and 9% improvement over the bimodal ASR systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.