Abstract

A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call