Abstract

The problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to select model complexity dynamically for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR (maximum likelihood linear regression) to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection is feasible when using as little as one adaptation utterance, and it is able to select dynamically the proper model complexity as the adaptation data increases. Compared with the standard MLLR, the MEL+MLLR method leads to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call