Abstract

Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call