Investigating Single Channel Source Separation Using Non-Negative Matrix Factorization and Its Variants for Overlapping Speech Signal

Nandini C Nag,Milind S Shah

doi:10.1109/icnte44896.2019.8946013

Abstract

A pre-processor to speech recognition, audio source separation may mitigate the problem of quality degradation of individual signal recognition in scenarios like cock-tail party environment. The same may be used for various other applications like audio forensics, speaker verification, instrument identification, hearing aids, etc. There are various techniques available for single channel audio source separation, but the technique based on Non-negative Matrix Factorization (NMF) is widely used. Several research studies have shown considerable performance improvement of signal separation using NMF on different mixture of audio signals like speech with noise, speech with music, speech with speech taken from different audio databases. In this paper, single channel source separation using Non-Negative Matrix Factorization and its variants for two-speaker mixed signal is investigated using same speech database, the GRID speech corpus. The separation performances of phase-aware algorithms are compared with phase-unaware approaches based on NMF and its variants. The quality of separated speech was judged by varying parameters such as number of bases and analysis window size.

Full Text