Abstract

Automatic speech recognition (ASR) system suffers from the variation of acoustic quality in an input speech. Speech may be produced in noisy environments and different speakers have their own way of speaking style. Variations can be observed even in the same utterance and the same speaker in different moods. All these uncertainties and variations should be normalized to have a robust ASR system. In this paper, we apply and evaluate different approaches of acoustic quality normalization in an utterance for robust ASR. Several HMM (hidden Markov model)-based systems using utterance-level, word-level, and monophone-level normalization are evaluated with HMM-SM (subspace method)-based system using monophone-level normalization for normalizing variations and uncertainties in an utterance. SM can represent variations of fine structures in sub-words as a set of eigenvectors, and so has better performance at monophone-level than HMM. Experimental results show that word accuracy is significantly improved by the HMM-SM-based system with monophone-level normalization compared to that by the typical HMM-based system with utterance-level normalization in both clean and noisy conditions. Experimental results also suggest that monophone-level normalization using SM has better performance than that using HMM.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.