Abstract

The performance of speech enhancement algorithms can be further improved by considering the application scenarios of speech products. In this paper, we propose an attention-based branchy neural network framework by incorporating the prior environmental information for noise reduction. In the whole denoising framework, first, an environment classification network is trained to distinguish the noise type of each noisy speech frame. Guided by this classification network, the denoising network gradually learns respective noise reduction abilities in different branches. Unlike most deep neural network (DNN)-based methods, which learn speech reconstruction capabilities with a common neural structure from all training noises, the proposed branchy model obtains greater performance benefits from the specially trained branches of prior known noise interference types. Experimental results show that the proposed branchy DNN model not only preserved better enhanced speech quality and intelligibility in seen noisy environments, but also obtained good generalization in unseen noisy environments.

Highlights

  • Speech enhancement techniques have been widely used to cope with the noise interference problem in the front end of many speech applications, such as mobile phones, hearing aids, and speech recognition products

  • We investigated the effect of prior environment information for the performance improvement of deep neural network (DNN)-based speech enhancement algorithms

  • An environment attention guided branchy neural network was proposed to cope with the noise interference problem in some known application scenarios

Read more

Summary

Introduction

Speech enhancement techniques have been widely used to cope with the noise interference problem in the front end of many speech applications, such as mobile phones, hearing aids, and speech recognition products. Deep neural network (DNN)-based speech enhancement methods have shown significant performance advantages over the traditional approaches in complex noise environments, even the extremely nonstationary noises. Richer datasets, and a better objective function and neural network models were further explored to guarantee the robust generalization ability of DNN models to cope with the diversified noise environments in real life. In the research of [14], experimental results demonstrated that the richness of the clean speech samples and the noise samples were the two crucial aspects to improve the generalization capacity of DNNs. Recently, researchers have paid more attention to the optimization of DNN models for the speech enhancement task. The optimization of the objective function [20,21] has been explored

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.