Abstract

In this paper, a novel deep neural network (DNN) architecture is proposed to generate the speech features of both the target speaker and interferer for speech separation without using any prior information about the interfering speaker. DNN is adopted here to directly model the highly nonlinear relationship between speech features of the mixed signals and the two competing speakers. Experimental results on a monaural speech separation and recognition challenge task show that the proposed DNN framework enhances the separation performance in terms of different objective measures under the semi-supervised mode where the training data of the target speaker is provided while the unseen interferer in the separation stage is predicted by using multiple interfering speakers mixed with the target speaker in the training stage. Furthermore, as a preprocessing step in the testing stage for robust speech recognition, our speech separation approach can achieve significant improvements of the recognition accuracy over the baseline system with no source separation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call