Abstract

In this paper, a robust binaural speech separation system based on deep neural network (DNN) is introduced. The proposed system has three main processing stages. In the spectral processing stage, the multiresolution cochleagram (MRCG) feature is extracted from the beamformed signal. In the spatial processing stage, a novel reliable spatial feature of smITD + smILD is obtained by soft missing data masking of binaural cues. In the final stage, a deep neural network takes the combined spectral and spatial features and estimates a newly defined ideal ratio mask (IRM) designed for noisy and reverberant conditions. The performance of the proposed system is evaluated and compared with two recent binaural speech separation systems as baselines in various noisy and reverberant conditions. Furthermore, the performance of each processing stage is explored and compared to those of state-of-the-art approaches. A multitalker spatially diffuse babble is used as interferer at four signal-to-noise ratios (SNRs). Simulated rooms with four matched and four unmatched reverberation times (RTs) are considered in the experiments. It is shown that the proposed system outperforms the baseline systems in improving the intelligibility and quality of separated speech signals in reverberant and noisy conditions. The results confirm the efficiency of each system component, especially in highly reverberant scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call