Abstract

This work evaluates multi-microphone beamforming techniques and single-microphone spectral enhancement strategies to alleviate the reverberation effect for robust automatic speech recognition (ASR) systems in different reverberant environments characterized by different reverberation times T60 and direct-to- reverberation ratios (DRRs). The systems under test consist of minimum variance distortionless response (MVDR) beamformers in combination with minimum mean square error (MMSE) estimators. For the later, reliable late reverberation spectral variance (LRSV) estimation employing a generalized model of the room impulse response (RIR) is crucial. Based on the generalized RIR model which separates the direct path from the remaining RIR, two different frequency resolutions in the short time Fourier transform (STFT) domain are evaluated, referred to as short- and long-term, to effectively estimate the direct signal. Regarding to the fusion between the MVDR beamformer and the MMSE estimator, the LRSV estimator can operate either on the multi-channel observed speech signals or on the single-channel beamformer output. By this, in this contribution, four different combination system architectures are evaluated and analyzed with a focus on optimal ASR performance w.r.t. word error rate (WER).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.