Abstract
In statistical speech recognition, speaker-independent models are usually trained by using speech samples from a large number of speakers. Those models have a problem in that they have wider feature distributions and hence greater overlaps between different phones than adequately trained speaker-dependent models. In order to cope with the interspeaker variability, a method of speech feature normalization based on affine transformation has been presented [P. Luo and K. Ozeki, Tech. Rep. of IEICE, SP96-10 (1996)]. Prior to HMM training, feature vectors of each speaker are mapped to those of a reference speaker by an affine transformation estimated with a small amount of training data. The transformation, which is phone independent and speaker dependent, is also applied to feature vectors of unknown speakers in the recognition stage. It has been shown experimentally that this method is effective in reducing interspeaker variations in the cepstral domain. In this paper, further discussions about the performance and limitations of the method are given within the framework of continuous HMM. Practical issues related to the selection of an appropriate reference speaker will also be discussed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.