Abstract
In many applications such as music transcription, audio forensics, and speech source separation, it is needed to decompose a mono recording into its respective sources. These techniques are usually referred to as blind source separation (BSS). One of the methods recently used in BSS is non-negative matrix factorization (NMF) both in supervised and unsupervised learning cases. In this paper, we propose a novel NMF-based algorithm namely, multi-layer KL-CNMF (Kullback-Leibler-Complex NMF) using fuzzy initial clustering to improve the performance of BSS in the unsupervised mode. In addition, we use LPC error clustering as a powerful criterion especially for separating harmonic signals such as certain speech sources from their multi-layer KL-CNMF components. The results on speech mixtures of the TIMIT database based on signal to distortion ratio (SDR) and signal to interference ratio (SIR) show that the proposed system significantly outperforms the baseline system which is an NMF-based BSS with LPC error clustering.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.