Abstract

Conventional speaker-independent HMMs ignore the speaker differences and collect speech data in an observation space. This causes a problem that the output probability distribution of the HMMs becomes vague so that it deteriorates the recognition accuracy. To solve this problem, we construct the speaker subspace for an individual speaker and correlate them by O-space canonical correlation analysis between the standard speaker and input speaker. In order to remove the constraint that input speakers have to speak the same sentences as the standard speaker in the supervised normalization, we propose an unsupervised speaker normalization method which automatically segments the speech data into phoneme data by the Viterbi decoding algorithm and then associates the mean feature vectors of the phoneme data by O-space canonical correlation analysis. We show the phoneme recognition rate by this unsupervised method is equivalent with that of the supervised normalization method we have already proposed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call