Supervised single-channel speech enhancement using ratio mask with joint dictionary learning

Long Zhang,Guangzhao Bao,Jing Zhang,Zhongfu Ye

doi:10.1016/j.specom.2016.06.001

Abstract

A novel structure which combines the advantages of ratio mask (RM) and joint dictionary learning (JDL) is proposed for single-channel speech enhancement in this paper. The novel speech enhancement structure makes full use of the training data and overcomes some shortcomings of generative dictionary learning (GDL) algorithm. RMs of speech and interferer are introduced to provide the discriminative information both in the training stage and enhancement stage of the novel structure. In the training stage, the signals and their corresponding ideal RMs (IRMs) are used to learn the signal and IRM dictionaries jointly by K-SVD algorithm. In the enhancement stage, the mixture signal and mixture RM are sparsely represented over the composite dictionaries composed of the learned signal and IRM dictionaries to formulate a joint sparse coding (JSC) problem. Then, the estimated RMs (ERMs) of speech and interferer in the mixture are calculated to develop two soft mask (SM) filters. The proposed SM filters incorporate ideal binary mask technique and Wiener-type filter to make full use of the discriminative information provided by the ERMs. They are used to both strengthen the speech and suppress the interferer in the mixture. The proposed algorithms have shown their abilities to improve both speech intelligibility and quality. Experimental evaluations verify the proposed algorithms obtain comparable performances to a deep neural network (DNN) based mask estimator with lower computation and perform better than other tested algorithms.

Full Text