The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than the speech signal— which lacks the ability to track the highly non-stationary noise sources. Moreover, the performance of the MMSE-based noise PSD estimator largely depends upon the accuracy of the a priori SNR estimation in practice. In this paper, we introduce a noise PSD estimation algorithm using a derivative-based high-pass filter in non-stationary noise conditions. The proposed method processes the silent and speech frames of the noisy speech differently to estimate the noise PSD. It is due to the non-stationary noise that can be mixed with silent and speech-dominated frames non-uniformly. We first introduce a spectral-flatness-based adaptive thresholding technique to detect the speech activity of the noisy speech frames. Since the silent frame of the noisy speech is completely filled with noise, the noise periodogram is directly computed from it without applying any filtering. Conversely, a 4th order derivative-based high-pass filter is applied during speech activity of the noisy speech frame to filter out the clean speech components while leaving behind mostly the noise. The noise periodogram is computed from the filtered signal—which counteracts the leaking of clean speech power. The noise PSD estimate is obtained by recursively averaging the previously estimated noise PSD and the current estimate of the noise periodogram. The proposed method is found to be effective in tracking the rapidly changing as well as the slowly varying noise PSD than the competing methods in non-stationary noise conditions for a wide range of signal-to-noise ratio (SNR) levels. Extensive objective and subjective scores on the NOIZEUS corpus demonstrate that the application of the proposed noise PSD with MMSE-based speech enhancement methods produce higher quality and intelligible enhanced speech than the competing methods.
Read full abstract