Abstract
Filter bank is the most common feature being employed in the research of the marginalisation approaches for robust speech recognition due to its simplicity in detecting the unreliable data in the frequency domain. In this paper, we propose a hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coefficients (MFCCs) instead of filter bank coefficients. A new technique for estimating the reliability of each cepstral component is also presented. Experimental results show the effectiveness of the proposed approaches.
Highlights
In spite of many years of efforts, the robustness of speech recognition in the noisy environment is still a fundamental unsolved issue in today’s automatic speech recognition (ASR) systems
Cepstral features are more compactible, discriminable, and most importantly, nearly decorrelated such that they allow the diagonal covariance to be used by the hidden Markov models (HMMs) effectively
We propose the new cepstral marginalisation and cepstral soft decision approaches for the Mel-frequency cepstral coefficients (MFCCs)
Summary
In spite of many years of efforts, the robustness of speech recognition in the noisy environment is still a fundamental unsolved issue in today’s automatic speech recognition (ASR) systems. Missing data theory [1, 2, 3, 4] is proposed as an operationalization to improve the robustness of the ASR decoding process. The Mel-frequency cepstral coefficient (MFCC) [5] representation of speech is probably the most commonly used representation in speech recognition and recently being standardized for the distributed speech recognition (DSR) [6]. Cepstral features are more compactible, discriminable, and most importantly, nearly decorrelated such that they allow the diagonal covariance to be used by the hidden Markov models (HMMs) effectively. They can usually provide higher baseline performance over filter bank features. Applying missing data techniques to cepstral features is obviously attractive and natural
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have