Abstract

In many applications such as music transcription, audio forensics, and speech source separation, it is needed to decompose a mono recording into its respective sources. These techniques are usually referred to as blind source separation (BSS). One of the methods recently used in BSS is non-negative matrix factorization (NMF) both in supervised and unsupervised learning cases. In this paper, we propose a novel NMF-based algorithm namely, multi-layer KL-CNMF (Kullback-Leibler-Complex NMF) using fuzzy initial clustering to improve the performance of BSS in the unsupervised mode. In addition, we use LPC error clustering as a powerful criterion especially for separating harmonic signals such as certain speech sources from their multi-layer KL-CNMF components. The results on speech mixtures of the TIMIT database based on signal to distortion ratio (SDR) and signal to interference ratio (SIR) show that the proposed system significantly outperforms the baseline system which is an NMF-based BSS with LPC error clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call