Abstract

Deep learning-based methods have made significant achievements in speech separation. Especially the time-domain separation methods have achieved the best performance in recent years. However, time-domain methods are unstable for waveform transformation, which is prone to amplitude and phase errors. Considering the robustness of time-frequency (T-F) domain methods, we propose an innovative network architecture called Time-Frequency Domain Corrector Network (TFCNet), which consists of a time-domain separator and a specially-designed T-F domain corrector. The corrector module is added after the time-domain separation step to correct the real and imaginary parts information in the T-F domain. The proposed model achieves state-of-the-art performance with an SI-SDRi of 22.2dB on the WSJ0-2mix dataset and an SI-SDRi of 19.4dB on the Libri-2mix dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call