Abstract

Single channel speech separation (SCSS) is often required as post-processing in several applications that facilitate automatic human-to-human or human-to-machine communication in challenging acoustic environments such as voice command for smart homes or robotics. The proposed SCSS system, that the authors call phase-aware subspace decomposition (PASD), relies on subspace decomposition for speech separation followed by a phase-aware mask for final subspace recovery. In fact, the proposed approach decomposes the mixture into a sparse and low-rank subspace in the frequency domain by rank minimising that relies on iterative decomposition using adaptive thresholding in each iteration to achieve soft estimation and considers phase-information for reconstruction. Separation results are reported in terms of both intrusive and non-intrusive metrics using realistic recordings corrupted with real-life noises. As speech separation systems are expected to have maximal interference rejection without speech distortion, we also evaluate the proposed system by recognising speech from a target speaker in the presence of either concurrent speech or noise. Recognition results show that separated signals are of high intelligibility so that they can be exploited by other automatic applications. The extensive evaluation under different test scenarios proves that PASD consistently improves the quality of the separated signals, compared to other benchmark approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call