Abstract

As a fundamental part of single microphone speech quality enhancement, noise power spectrum estimation is particularly challenging in adverse environments with low signal-to-noise ratio (SNR) and highly non-stationary background noise. In this paper, we propose a novel scheme which applies human speech properties, such as pitch properties of voiced speech and statistical properties of durations of unvoiced speech, into subband spectral tracking to estimate the power spectrum of non-stationary noise. We show that our proposed method is able to estimate the power spectrum more accurately and faster when the noise is highly non-stationary and the proposed method tracks bursts of noise 4–6 times faster than competitive methods. We also show that the mean square error of the estimated noise spectrum by the proposed method is 15% lower on average than competitive methods. The proposed algorithm is then combined with conventional MMSE-STSA and its overall performance is tested in a speech enhancement application. Simulation results justify that the segmental SNR improvement of the proposed system is on average 0.9 dB higher than the competitive system, and the mean opinion score (MOS) improvement is on average 0.17 higher than the competitive system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.