Abstract
In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We apply probabilistic 2-mode analysis to speaker adaptation by representing each of the hidden Markov model mean vectors of training speakers as a matrix, and derive the speaker adaptation equation in the maximum a posteriori (MAP) framework. The adaptation equation becomes similar to the speaker adaptation equation using the MAP linear regression adaptation. In the experiments, the adapted models based on probabilistic 2-mode analysis showed performance improvement over the adapted models based on Tucker decomposition, which is a representative multilinear decomposition technique, for small amounts of adaptation data while maintaining good performance for large amounts of adaptation data.
Highlights
In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation
Speaker adaptation based on tensor analysis using Tucker decomposition [4] was investigated in [5], where bases were constructed from the multilinear decomposition of a tensor that consisted of the HMM mean vectors of training speakers
We describe a speaker adaptation method using probabilistic 2-mode analysis, which is an application of probabilistic tensor analysis (PTA) [8] to the second-order tensor; PTA is an application of probabilistic principal component analysis (PCA) (PPCA) [9] to tensor objects
Summary
In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation. One of such mismatches results from speaker variation. The experiments showed that the proposed method further improved the performance of the speaker adaptation based on Tucker decomposition for small amounts of adaptation data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.