On the Compensation Between Magnitude and Phase in Speech Separation

Zhong-Qiu Wang,Jonathan Le Roux,Gordon Wichern

doi:10.1109/lsp.2021.3116502

Zhong-Qiu Wang, Jonathan Le Roux + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/lsp.2021.3116502

Copy DOI

Export

Save

Cite

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2021
Citations: 63

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view.

Full Text