Abstract

The speech enhancement based on the generative adversarial network has achieved excellent results with large quantities of data, but performance in the low-data regime and tasks like unseen data learning still lag behind. In this work, we model Wasserstein Conditional Generative Adversarial Network-Gradient Penalty speech enhancement system and introduce the elastic network into the objective function to simplify and improve the performance of the model in low-resource data environment. We argue that the regularization is significant in learning with small amounts of data and the available information of the input data is key in speech enhancement performance and generalization ability of the model, which means that network parameters and network structure can be set up and designed according to the characteristics of actual input data. Experiments on the noisy speech corpus show that the improved algorithm outperforms previous generative adversarial network speech enhancement approach.

Highlights

  • Speech enhancement is one of the main technologies to improve the performance of speech systems in noisy environment [1,2,3]

  • Comparison of convergence performance: The convergence of loss function is selected to estimate the convergence of the system based on a generative adversarial network (SEGAN) model and improved SEWCGAN model

  • The main purpose is to explore the impacts of generalization performance, so we show the average Signal-tonoise ratio (SNR) of SEGAN only

Read more

Summary

Introduction

Speech enhancement is one of the main technologies to improve the performance of speech systems in noisy environment [1,2,3]. The generation adversarial network has shown great potential in deep learning and has been applied to the field of speech enhancement with large quantities of data, which overcomes the limitations of traditional network for speech enhancement of specific targets and shows good generalization performance for unseen environmental noise [4, 5]. The problems like instability of learning and mode collapse of generative adversarial network (GAN) affect its practical applications. Many improved algorithms have been proposed such as conditional GAN and Wasserstein GAN to solve those disadvantages. Those improved GAN algorithms have not been used in speech enhancement yet and the performance in low-resource environment still lags. The goal is to learn a generator distribution that matches the real data

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.