Abstract

Presents a framework for designing a hidden Markov model (HMM)-based audio-visual automatic speech recognition system based on minimum classification error (MCE) training. Audio/visual HMMs are optimized with MCE training based on the generalized probabilistic descent (GPD) method, and their likelihoods are combined using model-dependent stream weights which are also estimated with the GPD method. Experimental results of speaker-independent isolated word recognition show that the GPD optimization of the audio/visual HMMs and the use of GPD-based model-dependent stream weights provide a significant improvement in system performance, leading to a 47%-81% error reduction over a conventional system which consists of HMMs trained based on the maximum likelihood criterion and globally-tied stream weights estimated with the GPD method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.