Layered Convolutive Nonnegative Matrix Factorization for Speech Separation

Wang Yao,Mingyuan Gao,Danjv Lv,Jiali Zi,Xin Huang,Yan Zhang,Rui Xi

doi:10.1088/1742-6596/2258/1/012020

Wang Yao, Mingyuan Gao + Show 5 more

Open Access

https://doi.org/10.1088/1742-6596/2258/1/012020

Copy DOI

Abstract

Abstract Nonnegative matrix factorization (NMF) has attracted significant attention for its good performance in single-channel speech separation. The improved algorithms of NMF have become research hotspots. Layered NMF (LNMF), an improved algorithm, can express the source signal more accurately for its multilayer structure. However, LNMF sometimes performs poorly because it ignores the short-term correlation of speech signals. Based on LNMF and the advantages of Convolutive NMF (CNMF), we proposed a Layered Convolutive NMF(LCNMF) algorithm for single-channel speech separation. The LCNMF corporates the multilayer structure into the NMF and expands the convolution of the top-level NMF model. During the training, NMF is used to learn the non-top-level basis matrices, and CNMF is used to learn the top-level basis matrix, then combined with each single-layer of basis matrix. During the prediction, CNMF is used to separate mixed signals. The results on the dataset MIK-1K showed that LCNMF outperformed NMF and LNMF for separating the mixture of single-channel speech signals. LCNMF improved by 0.019, 1.049dB, 1.305dB, and 0.851dB on average compared with NMF, and improved by 0.007, 0.172dB, 0.090dB, and 0.366dB on average compared with LNMF in sort-term objective intelligibility (STOI), Source to Distortion Ratio (SDR), Source to Interference Ratio (SIR) and Source to Artifacts Ratio (SAR)

Full Text