Single-channel speech enhancement based on improved frame-iterative spectral subtraction in the modulation domain

Chao Li,Ting Jiang,Sheng Wu

doi:10.23919/jcc.2021.09.009

Abstract

Aiming at the problem of music noise introduced by classical spectral subtraction, a short- time modulation domain (STM) spectral subtraction method has been successfully applied for singlechannel speech enhancement. However, due to the inaccurate voice activity detection (VAD), the residual music noise and enhanced performance still need to be further improved, especially in the low signal to noise ratio (SNR) scenarios. To address this issue, an improved frame iterative spectral subtraction in the STM domain (IMModSSub) is proposed. More specifically, with the inter-frame correlation, the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain. Then, the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR. With these classification results, a corresponding mask function is developed for noisy speech after noise subtraction. Finally, exploiting the increased sparsity of speech signal in the modulation domain, the orthogonal matching pursuit (OMP) technique is employed to the speech frames for improving the speech quality and intelligibility. The effectiveness of the proposed method is evaluated with three types of noise, including white noise, pink noise, and hfchannel noise. The obtained results show that the proposed method outperforms some established baselines at lower SNRs (−5 to +5 dB).

Full Text