Abstract

In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We apply probabilistic 2-mode analysis to speaker adaptation by representing each of the hidden Markov model mean vectors of training speakers as a matrix, and derive the speaker adaptation equation in the maximum a posteriori (MAP) framework. The adaptation equation becomes similar to the speaker adaptation equation using the MAP linear regression adaptation. In the experiments, the adapted models based on probabilistic 2-mode analysis showed performance improvement over the adapted models based on Tucker decomposition, which is a representative multilinear decomposition technique, for small amounts of adaptation data while maintaining good performance for large amounts of adaptation data.

Highlights

  • In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation

  • Speaker adaptation based on tensor analysis using Tucker decomposition [4] was investigated in [5], where bases were constructed from the multilinear decomposition of a tensor that consisted of the HMM mean vectors of training speakers

  • We describe a speaker adaptation method using probabilistic 2-mode analysis, which is an application of probabilistic tensor analysis (PTA) [8] to the second-order tensor; PTA is an application of probabilistic principal component analysis (PCA) (PPCA) [9] to tensor objects

Read more

Summary

Introduction

In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation. One of such mismatches results from speaker variation. The experiments showed that the proposed method further improved the performance of the speaker adaptation based on Tucker decomposition for small amounts of adaptation data.

Multilinear decomposition
Speaker adaptation using Tucker decomposition
Construction of probabilistic 2-mode model for speaker adaptation
Method
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.