Abstract

In recent years, significant success has been made in single-channel speech enhancement using the deep neural networks. These approaches trained a model on synthetic noisy speech corpus, which was created by adding noise to clean speech. Because there is a mismatch between synthetic training data and the actual application environment, the model's performance is not guaranteed. This paper proposes to use a multi-channel speech enhancement teacher model to guide a single-channel noise suppression student model. We set the multi-channel teacher's processed signal as the single-channel student's training target. With our proposed approach, the single-channel speech enhancement model can be trained using real noisy speech and performed as well as a multi-channel speech enhancement model. Experimental results on CHIME-3 demonstrate that our proposed approach can achieve competitive performance both in speech enhancement and automatic speech recognition tasks, even without ground truth signals.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.