Abstract

The authors describe a speaker adaptation method consisting of two stages. In the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker. In the second stage, well-trained hidden Markov model (HMM) parameters are transformed by using a linear mapping function. This is estimated by counting the correspondences along the alignment between a state sequence of an HMM and a label sequence of a new speaker utterance. This adaptation procedure was tested in an isolated word recognition task using 150 confusable Japanese words. The original label prototypes and HMM parameters were estimated for a male speaker, who spoke each word 10 times. When the adaptation procedure was applied with 25 words, the average error rate for another seven male speakers was reduced from 25.0% to 5.6%, which was roughly the same as that for the original speaker. This procedure was also effective for adaptation between male and female speakers. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call