Abstract

Speech signals reaching our ears are in general contaminated by the background noise distortion which is detrimental to both speech quality and intelligibility. In this paper, we propose a nonlinear multi-scale decomposition-based deep speech enhancement method to improve the quality and intelligibility of the contaminated speech. In the proposed method, we have applied Hurst exponent-based Empirical Mode Decomposition (HEMD) to the noisy speech and obtained a set of intrinsic mode functions (IMFs) and a residual. The Deep Neural Networks (DNNs) are trained for each of the extracted IMF and residual to learn a non-linear mapping with a deep hidden structure to construct a time–frequency mask. We have formulated three deep speech enhancement structures, established on three time–frequency​ masks comprised of Ideal Ratio Mask (IRM), Ideal Binary Mask (IBM), and Phase Sensitive Mask (PSM). Background noise also degrades the original phase of the clean speech; therefore, introduces perceptual disturbance which leads to negative impacts on the speech quality and intelligibility. To avoid speech quality and intelligibility degradations, an iterative procedure is adopted to compensate the phase during noisy backgrounds. Nonlinear Mel-scale weighted MSE (LMW−MSE) is used as a loss function during network training, and computed the gradients which are based on the perceptually motivated nonlinear frequency scale. Usually, the output features of the conventional deep neural networks are over-smoothed which deteriorates the quality of the speech. To alleviate over-smoothness; frequency-independent spectral variance equalization is applied as a post-filtering method. The performance of the proposed deep enhancement methods is extensively evaluated and compared to the DNNs established on same time–frequency mask in various adverse noisy environments. The results have demonstrated that the proposed deep speech enhancement performed better in terms of the perceived speech quality and intelligibility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call