Abstract
This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. We leverage three different domains of speech representation, namely spectral magnitude, waveform, and complex spectrogram, to progressively suppress the background noise within noisy speech. Our proposed neural cascade architecture consists of three modules, and each operates on the original noisy input and the output of the previous module in a distinct speech representation. During training, the network simultaneously optimizes all modules with a triple-domain loss. Experiments on the WSJ0 SI-84 corpus demonstrate that our proposed approach achieves superior enhancement results, and substantially outperforms previous baselines in terms of both speech quality and intelligibility.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have