Abstract

Modern speech recognition systems are based on the hidden Markov model (HMM) and employ cepstral features to represent input speech. In speaker normalization, the cepstral features of speech from a given speaker are transformed to match the speaker independent HMM. In speaker adaptation, the means of the HMM are transformed to match the input speech. Vocal tract length normalization (VTLN) is a popular normalization scheme wherein the frequency axis of the short-time spectrum is rescaled prior to the extraction of cepstral features. In this work, we develop novel speaker adaptation schemes by exploiting the fact that frequency domain transformations similar to that inherent in VTLN can be accomplished entirely in the cepstral domain through the use of conformal maps. We describe two classes of such maps: rational all-pass transforms (RAPTs) which are well-known in the signal processing literature, and sine-log all-pass transforms (SLAPTs) which are novel in this work. For both classes of maps, we develop the relations necessary to perform maximum likelihood estimation of the relevant transform parameters using enrollment data from a new speaker. We also propose the means by which an HMM may be trained specifically for use with this type of adaptation. Finally, in a set of recognition experiments conducted on conversational speech material from the Switchboard Corpus as well as the English Spontaneous Scheduling Task, we demonstrate the capacity of APT-based speaker adaptation to achieve word error rate reductions superior to those obtained with other popular adaptation techniques, and moreover, reductions that are additive with those provided by VTLN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.