Abstract

A new Bayesian estimation framework for statistical feature extraction in the form of cepstral enhancement is presented, in which the joint prior distribution is exploited for both static and frame-differential dynamic cepstral parameters in the clean speech model. The conditional minimum mean square error (MMSE) estimator for the clean speech feature is derived using the full posterior probability for clean speech given the noisy observation. The final form of the estimator (for each mixture component) is a weighted sum of the prior information using the static and the dynamic priors separately, and of the prediction using the acoustic distortion model in absence of any prior information. Comprehensive noise-robust speech recognition experiments using the Aurora2 database demonstrate significant improvement in accuracy by incorporating the joint prior, compared with using only the static or dynamic prior and with using no prior.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.