Abstract

In this paper, we propose a new signal-noise-dependent (SND) deep neural network (DNN) framework to further improve the separation and recognition performance of the recently developed technique for general DNN-based speech separation. We adopt a divide and conquer strategy to design the proposed SND-DNNs with higher resolutions that a single general DNN could not well accommodate for all the speaker mixing variabilities at different levels of signal-to-noise ratios (SNRs). In this study two kinds of SNR-dependent DNNs, namely positive and negative DNNs, are trained to cover the mixed speech signals with positive and negative SNR levels, respectively. At the separation stage, a first-pass separation using a general DNN can give an accurate SNR estimation for a model selection. Experimental results on the Speech Separation Challenge (SSC) task show that SND-DNNs could yield significant performance improvements for both speech separation and recognition over a general DNN. Furthermore, this purely front-end processing method achieves a relative word error rate reduction of 11.6% over a state-of-the-art recognition system where a complicated joint decoding framework needs to be implemented in the back-end.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call