Abstract

Although most noise reduction algorithms are critically dependent on the noise power spectral density (PSD), most procedures for noise PSD estimation fail to obtain good estimates in nonstationary noise conditions. Recently, a DFT-subspace-based method was proposed which improves noise PSD estimation under these conditions. However, this approach is based on eigenvalue decompositions per DFT bin, and might be too computationally demanding for low-complexity applications like hearing aids. In this paper we present a noise tracking method with low complexity, but approximately similar noise tracking performance as the DFT-subspace approach. The presented method uses a periodogram with resolution that is higher than the spectral resolution used in the noise reduction algorithm itself. This increased resolution enables estimation of the noise PSD even when speech energy is present at the time-frequency point under consideration. This holds in particular for voiced type of speech sounds which can be modelled using a small number of complex exponentials.

Highlights

  • The growing interest in mobile digital speech processing devices for both human-to-human and human-to-machine communication has led to an increased use of these devices in noisy conditions

  • For performance evaluation of the proposed method for noise power spectral density (PSD) estimation we compare its performance with three reference methods, namely, noise PSD estimation based on MS as proposed in [10], QB noise PSD estimation as proposed in [12] with quantile parameter p = 0.5 and a buffer length of 20 frames, and noise PSD estimation based on the discrete Fourier transform (DFT)-subspace approach as proposed in [17]

  • And 4.2, we see that the performance of the proposed method is quite similar to the recently presented DFT-subspace based method [17]. The latter approach is based on a KarhunenLoeve transform (KLT) of a sequence of complex DFT coefficients observed in the same frequency bin across time. This implies the use of a KLT for each DFT bin, while the proposed method is based on one single high-resolution DFTs (HR-DFTs) per super-frame; the DFT-subspace approach and the proposed method are based on different signal models

Read more

Summary

Introduction

The growing interest in mobile digital speech processing devices for both human-to-human and human-to-machine communication has led to an increased use of these devices in noisy conditions. In such conditions, it is desirable to apply noise reduction as a preprocessing step in order to extend the SNR range in which the performance of these applications is satisfactory. A group of methods that is often used for noise reduction in the single-microphone setup are the so-called discrete Fourier transform (DFT) domain-based approaches These methods work on a frame-by-frame basis where the noisy signal is divided in windowed time-frames, such that both quasistationarity constraints imposed by the input signal and delay constraints imposed by the application at hand are satisfied. From the resulting noisy speech DFT coefficients the corresponding clean speech DFT coefficients are estimated, typically by using Bayesian estimators [1] followed by an inverse DFT to the time domain and an overlap-add procedure to synthesize the enhanced signal

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.