Abstract

We consider the estimation of the speech short-time spectral amplitude (STSA) using a parametric Bayesian cost function and speech prior distribution. First, new schemes are proposed for the estimation of the cost function parameters, using an initial estimate of the speech STSA along with the noise masking feature of the human auditory system. This information is further employed to derive a new technique for the gain flooring of the STSA estimator. Next, to achieve better compliance with the noisy speech in the estimator’s gain function, we take advantage of the generalized Gamma distribution in order to model the STSA prior and propose an SNR-based scheme for the estimation of its corresponding parameters. It is shown that in Bayesian STSA estimators, the exploitation of a rough STSA estimate in the parameter selection for the cost function and the speech prior leads to more efficient control on the gain function values. Performance evaluation in different noisy scenarios demonstrates the superiority of the proposed methods over the existing parametric STSA estimators in terms of the achieved noise reduction and introduced speech distortion.

Highlights

  • Speech enhancement aims at the reduction of corrupting noise in speech signals while keeping the introduced speech distortion at the minimum possible level

  • We present a simple approach for the selection of the Generalized Gamma distribution (GGD) parameter c for the proposed short-time spectral amplitude (STSA) estimator

  • According to (22), the shape parameter c takes on its values as a linearly increasing function of the SNR in its possible range between cmin and cmax, leading to the appropriate adjustment of the estimator gain function based on the average power of the speech STSA components at each frame

Read more

Summary

Introduction

Speech enhancement aims at the reduction of corrupting noise in speech signals while keeping the introduced speech distortion at the minimum possible level. As experiments show, there may appear excessive distortion in the enhanced speech using the STSA estimator with this parameter choice, especially at high SNRs. we propose to use the adaptive approach in (11) as the basis for the selection of β, but to further apply the scheme in (12) as a form of frequency weighting to take into account the psycho-acoustics of the human auditory system within each time frame. According to (22), the shape parameter c takes on its values as a linearly increasing function of the SNR in its possible range between cmin and cmax, leading to the appropriate adjustment of the estimator gain function based on the average power of the speech STSA components at each frame. We employed the gain flooring scheme in (16) in cases where the proposed gain flooring is not used, since the closest results to

Proposed choice of α
Proposed choice of β
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.