Abstract

This paper proposes a semi-supervised approach of speech and music separation in monaural speech recordings based on non-negative matrix factorization (NMF). Considering the scenario that the genre of background music is known, music basis vectors are randomly picked from the magnitude of short time fourier transform (STFT) of training music, while speech basis vectors are estimated by executing NMF on the magnitude of STFT of polluted speech signal. Moreover, we apply sparseness and temporal continuity constraints to speech and music respectively and evaluate how different constraints can influence the separation performance. The test set contains 10 Mandarin speech utterances from 10 speakers mixed with music in different speech-music ratios (SMR). The baseline is semi-supervised separation system with no constraint. The results reveal that adding temporal continuity constraint can improve the separation performance compared with the baseline and separation system with only sparseness constraint.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.