Abstract
To reduce speech degradation in reverberant environments, we previously proposed a modulation transfer function (MTF) based method for speech dereverberation. It is based on the MTF relation that the sub-band temporal power envelope of reverberant speech can be represented as the convolution between temporal power envelopes of clean speech and the room impulse response. Therefore, the sub-band power envelope of clean speech can be estimated using inverse MTF filtering without measuring the room impulse response. We tested the effectiveness of this method as a front-end for automatic speech recognition (ASR) in both artificial and real reverberant environments. Reverberant speech signals were created by simple convolution of clean speech (AURORA-2J) and artificially-produced or real room impulse responses. The relative spectral filtering of the auditory-power-spectrum based method was used as a baseline. Compared with the baseline, our proposed method had 36.64% and 21.68% improvements in error reduction rate for artificial reverberant environments (reverberation times from 0.2 to 2.0 s) and real reverberant environments (43 reverberant impulse responses), respectively. These results indicate that our proposed method can be used as a robust front-end for ASR. [Work supported by a Grant-in-Aid for Science Research from the Japanese Ministry of Education (No. 18680017).]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.