Abstract

The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure (risk). Often, this results in an estimate that depends on the unknown clean signal or its statistics. Since access to such priors is limited or impractical, one has to rely on an estimate of the clean signal statistics. In this paper, we develop a risk estimation framework for speech enhancement, in which one optimizes an unbiased estimate of the risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations and the noise statistics. Hence, the corresponding denoiser does not require the clean speech prior. We consider several speech-specific perceptually relevant distortion measures and develop corresponding unbiased estimates. Minimizing the risk estimates gives rise to denoisers, which are nonlinear functions of the a posteriori SNR. Listening tests show that, within the risk estimation framework, Itakura-Saito and weighted hyperbolic cosine distortions are superior than the other measures. Comparisons in terms of perceptual evaluation of speech quality (PESQ), segmental SNR (SSNR), source-to-distortion ratio (SDR), and short-time objective intelligibility (STOI) also indicate a superior performance for these two distortion measures. For SNRs greater than 5 dB, the proposed approach results in better denoising performance — both in terms of objective and subjective assessment — than techniques based on the Wiener filter, log-MSE minimization, and Bayesian nonnegative matrix factorization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.