Abstract

In the single-channel speech enhancement, generative adversarial networks (GANs) have been proved to be effective in eliminating noise, and improving the intelligibility and quality of speech. However, in low signal noise ratio (SNR) environments, high noise reduces the quality of speech largely, it's challenging to remove the noise directly from the noisy speech. Here, we propose a feature-matching speech denoising GANs method via progressive training for low SNR. Our approach decomposes the challenging task into several simpler tasks by progressively increasing the SNR of data and the depth of the networks simultaneously. Through this training approach, the networks can reconstruct the finer distribution of clean speech. Meanwhile, we combine a feature-matching strategy based on the discriminator and a traditional feature-mapping method to reduce the discrepancy between the distribution of the enhanced features and the clean features. The networks are jointly optimized to minimax the loss between the generator and the discriminator. Evaluated on the AISHELL-2 dataset, the proposed GANs achieve a relative error rate (WER) improvement of 7.50% on the ASR-Clean model trained by clean data, and a relative improvement of 8.03% on the ASR-MTR model trained by the multi-style training. In STOI and PESQ, there are 4.69% and 10.67% relative improvements, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.