Abstract

The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than the speech signal— which lacks the ability to track the highly non-stationary noise sources. Moreover, the performance of the MMSE-based noise PSD estimator largely depends upon the accuracy of the a priori SNR estimation in practice. In this paper, we introduce a noise PSD estimation algorithm using a derivative-based high-pass filter in non-stationary noise conditions. The proposed method processes the silent and speech frames of the noisy speech differently to estimate the noise PSD. It is due to the non-stationary noise that can be mixed with silent and speech-dominated frames non-uniformly. We first introduce a spectral-flatness-based adaptive thresholding technique to detect the speech activity of the noisy speech frames. Since the silent frame of the noisy speech is completely filled with noise, the noise periodogram is directly computed from it without applying any filtering. Conversely, a 4th order derivative-based high-pass filter is applied during speech activity of the noisy speech frame to filter out the clean speech components while leaving behind mostly the noise. The noise periodogram is computed from the filtered signal—which counteracts the leaking of clean speech power. The noise PSD estimate is obtained by recursively averaging the previously estimated noise PSD and the current estimate of the noise periodogram. The proposed method is found to be effective in tracking the rapidly changing as well as the slowly varying noise PSD than the competing methods in non-stationary noise conditions for a wide range of signal-to-noise ratio (SNR) levels. Extensive objective and subjective scores on the NOIZEUS corpus demonstrate that the application of the proposed noise PSD with MMSE-based speech enhancement methods produce higher quality and intelligible enhanced speech than the competing methods.

Highlights

  • The speech processing systems have a close link to our daily life, such as mobile communication systems, hearing aid devices, and voiced operated autonomous systems

  • The noise periodogram is computed from the filtered signal—which mitigates the risk of leaking the speech power

  • On the other hand, during speech activity of y(n, l) (1 ≤ l ≤ L − 1), s(n, l) remains embedded with v(n, l)—which leads to a risk of leaking speech power to the estimated noise power, |V (l, m)|2. To cope with this problem, we have found that the application of a derivative based high-pass filter to y(n, l) during speech activity filtered out the components of s(n, l) before estimating |V (l, m)|2

Read more

Summary

Introduction

The speech processing systems have a close link to our daily life, such as mobile communication systems, hearing aid devices, and voiced operated autonomous systems. In the MS method, the noise PSD estimate is given by tracking the minimum of the smoothed noisy speech power spectrum in each frequency bin within a fixed time window. Regardless of using the estimated speech spectral power, due to the use of past estimated noise power in the DD approach by the IMMSE method [25], it may still fail to track the abrupt changing noise PSD for the current noisy speech frame. Zhang et al proposed a noise PSD tracking algorithm by incorporating a log-spectral power MMSE (MMSE-LSA) estimator In this method, the smoothing parameter used in the recursive operation for noise PSD estimation is adjusted based on the SPP method. The proposed method estimates the noise PSD by differently processing the silent/speech frames of the noisy speech For this purpose, the speech activity is first obtained using a spectral-flatness based adaptive thresholding technique.

Proposed noise PSD tracking algorithm
Speech enhancement using estimated noise PSD
Experimental setup
Subjective evaluation measure for speech enhancement
Results and discussions
Computational complexity evaluation of noise PSD estimators
Methods
Objective intelligibility evaluation of enhanced speech
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call