Abstract

Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coefficients, mel-frequency cepstrum coefficients, etc., have been found to be successful in enhancing noisy speech corrupted by nonstationary noise. However, these models are typically trained on speech data from multiple speakers under controlled acoustic conditions. In this paper, we introduce the notion of context-dependent models that are trained on speech data with one or more aspects of context, such as speaker, acoustic environment, speaking style, etc. In scenarios where the modeled and observed contexts match, context-dependent models can be expected to result in better performance, whereas context-independent models are preferred otherwise. In this paper, we present a Bayesian framework that automatically provides the benefits of both models under varying contexts. As several aspects of the context remain constant over an extended period during usage, a memory-based approach that exploits information from past data is employed. We use a codebook-based speech enhancement technique that employs trained models of speech and noise linear predictive coefficients as an example model-based approach. Using speaker, acoustic environment, and speaking style as aspects of context, we demonstrate the robustness of the proposed framework for different context scenarios, input signal-to-noise ratios, and number of contexts modeled.

Highlights

  • Speech enhancement pertains to the processing of speech corrupted by noise, echo, reverberation, etc. to improve its quality and intelligibility

  • We introduce a Bayesian framework to optimally combine the estimates from the CD and CI models to achieve robust speech enhancement under varying contexts

  • The first set consisted of experiments with two speech codebooks, a CI speech codebook and a CD speech codebook, modeling the speaker and acoustic environment as aspects of context

Read more

Summary

Introduction

Speech enhancement pertains to the processing of speech corrupted by noise, echo, reverberation, etc. to improve its quality and intelligibility. Speech enhancement pertains to the processing of speech corrupted by noise, echo, reverberation, etc. By speech enhancement, we refer to the problem of noise reduction. It is relevant in several scenarios, for example, mobile telephony in noisy environments, such as restaurants and busy traffic, suffers from unclear communication. Speech recognition units [1] and hearing aids [2] require speech enhancement as a preprocessing algorithm. Speech enhancement algorithms can be broadly classified into single- and multi-channel algorithms based on the number of microphones used to acquire the input noisy speech. Multi-channel algorithms exhibit superior performance because of the additional spatial information available about the noise and speech sources. The need for single-channel speech enhancement cannot

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call