Abstract

We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for speaker adaptation. To solve these problems, a normalized SI model is generated by removing speaker characteristics using a shift vector obtained by the maximum likelihood linear regression (MLLR) technique. In addition, we propose a speaker adaptation method that combines the MLLR and maximum a posteriori (MAP) techniques from the normalized SI model. Experiments have been performed on Japanese phoneme recognition test using continuous density mixture Gaussian HMMs. For the baseline recognition test of normalized SI model, a 12.8% reduction of the phoneme recognition error rate compared to the conventional SI model was achieved. Furthermore the proposed adaptation method using the normalized SI model was more effective than the tested conventional method regardless the amount of adaptation data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call