Abstract
Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging. Strong speech enhancement front-ends are available, however, they typically require that the ASR model is retrained to cope with the processing artifacts. In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). This is achieved by remixing the enhanced signal with the unprocessed input to alleviate the processing artifacts. We evaluate the proposed approach using a DNN speaker extraction based speech denoiser trained with a perceptually motivated loss function. Results show that (without AM retraining) our method yields about 23% and 25% relative accuracy gains compared with the unprocessed for the monoaural simulated and real CHiME-4 evaluation sets, respectively, and outperforms a state-of-the-art reference method.
Full Text
Topics from this Paper
Automatic Speech Recognition Model
Acoustic Model
Processing Artifacts
Automatic Speech Recognition
Noisy Conditions
+ Show 5 more
Create a personalized feed of these topics
Get StartedSimilar Papers
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Jan 1, 2022
Sensors
Feb 7, 2019
May 1, 2014
International Journal of Speech Technology
Jan 22, 2020
Speech Communication
Oct 1, 2008
Oct 4, 2004
Feb 23, 2023
Sep 1, 2017
IOP Conference Series: Materials Science and Engineering
Apr 1, 2020
Journal of Speech Sciences
Feb 5, 2021
Journal of Network and Computer Applications
Sep 1, 2011