Abstract

The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective. >

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.