Abstract

This paper describes a data-driven technique for optimizing the acoustic models for speech recognition systems that target commercial applications over telephones. Frame-averaged foreground log-likelihoods (foreground scores) correlate to recognition errors. These scores are used together with gender to optimize data weighting for the acoustic model. This process is interpreted as increasing the priors and associated parameters for poorly modeled data. The score-based optimization leads to about 7% fewer semantic errors on a live evaluation set collected after the last data used to estimate the acoustic model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call