Abstract

Eigenvoice (EV) speaker adaptation has been shown effective for fast speaker adaptation when the amount of adaptation data is scarce. In the past two years, we have been investigating the application of kernel methods to improve EV speaker adaptation by exploiting possible nonlinearity in the speaker space, and two methods were proposed: embedded kernel eigenvoice (eKEV) and kernel eigenspace-based MLLR (KEMLLR). In both methods, kernel PCA is used to derive eigenvoices in the kernel-induced high-dimensional feature space, and they differ mainly in the representation of the speaker models. Both had been shown to outperform all other common adaptation methods when the amount of adaptation data is less than 10s. However, in the past, only small vocabulary speech recognition tasks were tried since we were not familiar with the behaviour of these kernelized methods. As we gain more experience, we are now ready to tackle larger vocabularies. In this paper, we show that both methods continue to outperform MAP, and MLLR when only 5s or 10s of adaptation data are available on the WSJ0 5K-vocabulary task. Compared with the speakerindependent model, the two methods reduce recognition word error rate by 13.4% – 21.1%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.