Abstract

The use of spectral transformation to perform speaker adaptation for HMM based speech recognition is investigated. Three estimation methods, namely, minimum mean square error (MMSE), canonical correlation analysis (CCA) and multilayer perceptrons (MLP), for computing the transformation are compared. Using isolated words from the TI-46 database, it is found that CCA has the best adaptation performance. Moreover, a training-after-adaptation approach is found to have a higher adaptation performance than the one in which reference HMMs are not re-trained. With a suitable choice of reference speaker, less than 30% of training data from a new speaker is required in order to achieve the same accuracy as the speaker-dependent models of that new speaker, when the CCA method is used with the training-after-adaptation approach. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call