Robust Speech Recognition by Integrating Speech Separation and Hypothesis Testing

S Srinivasan,Deliang Wang Deliang Wang

doi:10.1109/icassp.2005.1415057

Abstract

Missing data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time-frequency domain. Such methods require a binary mask which labels time-frequency regions of a noisy speech signal as reliable if they contain more speech energy than noise energy and unreliable otherwise. Current methods for estimating the mask are based mainly on bottom-up speech separation cues such as harmonicity and produce labeling errors that cause a degradation in recognition performance. We propose a two stage recognition system in order to improve mask estimation and produce better recognition results. First, an n-best lattice consistent with the speech separation mask is generated. The lattice is then re-scored by expanding the mask using a model-based hypothesis test to determine the reliability of individual time-frequency regions. Systematic evaluations show significant improvement in recognition performance compared to that using speech separation.

Full Text