Abstract

Deep learning models do not perform well if they are not trained for the acoustic conditions under which they have to operate. Due to this reason, the neural network based speech separation models specifically designed for anechoic conditions do not perform well in reverberant conditions. As training a deep neural network is a lengthy and computationally expensive process, training it every time for any change in the acoustic conditions is almost always impossible. This paper presents a comparative study of few of the state-of-the-art dereverberation algorithms and suggests the best among them, which enables a U-Net based speech separation model, trained in anechoic conditions, to work under different reverberant conditions for online and offline applications. The results show that dereverberating the audio mixtures before they enter the anechoic U-Net based speech separation network by spectral subtraction (SS) dereverberation algorithm improves the signal to distortion ratio (SDR) by almost 0.84 dB for online applications over the anechoic U-Net based speech separation model if it is directly exposed to reverberations. For offline applications, there is an average improvement of 2 dB in SDR and 4% in short term objective intelligibility (STOI), when the mixtures are dereverberated by the cascaded system containing the weighted prediction error (WPE) and the spectral subtraction (SS) dereverberation models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call