Abstract

A priori signal-to-noise ratio (SNR) estimation and noise estimation are important for speech enhancement. In this paper, a novel modified decision-directed (DD) a priori SNR estimation approach based on single-frequency entropy, named DDBSE, is proposed. DDBSE replaces the fixed weighting factor in the DD approach with an adaptive one calculated according to change of single-frequency entropy. Simultaneously, a new noise power estimation approach based on unbiased minimum mean square error (MMSE) and voice activity detection (VAD), named UMVAD, is proposed. UMVAD adopts different strategies to estimate noise in order to reduce over-estimation and under-estimation of noise. UMVAD improves the classical statistical model-based VAD by utilizing an adaptive threshold to replace the original fixed one and modifies the unbiased MMSE-based noise estimation approach using an adaptive a priori speech presence probability calculated by entropy instead of the original fixed one. Experimental results show that DDBSE can provide greater noise suppression than DD and UMVAD can improve the accuracy of noise estimation. Compared to existing approaches, speech enhancement based on UMVAD and DDBSE can obtain a better segment SNR score and composite measure covl score, especially in adverse environments such as non-stationary noise and low-SNR.

Highlights

  • Single-channel speech enhancement has been used widely in various speech communication systems such as speech recognition, speech coding, and hearing aid devices

  • With the emergence of speech enhancement based on statistical models, a commonly used approach named as minimum mean square error (MMSE) spectral amplitude estimator was proposed by Ephraim and Malah in [1]

  • In order to test the performance of these approaches, two objective measures, segmental signal-to-noise ratio (SNR) and composite measure covl [25] will be used, where segment SNR (segSNR) indicates the performance of denoising which is in relation to the quality of speech, while covl has been regarded as a preferable measure about speech intelligibility

Read more

Summary

Introduction

Single-channel speech enhancement has been used widely in various speech communication systems such as speech recognition, speech coding, and hearing aid devices. In [2], based on the assumption that additive noise is stationary and the noise energy does not change significantly from frame to frame, Soon and Koh proposed a low-distortion speech enhancement approach using an adaptive weighting factor. Hasan et al in [3] proposed a way to calculate the optimal weighting factor based on MMSE to account for the abrupt changes in the speech spectral amplitude In order to compensate for the bias caused by a priori SNR estimation in the traditional MMSE approach, Gerkmann and Hendriks in [19] proposed an unbiased MMSE-based noise estimator which used the a posteriori speech presence probability as the weight of recursion.

Review of basic principle
The DD approach for a priori SNR estimation
Statistical model VAD-based noise estimation
Calculation of a priori SPP
Experimental results and discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call