Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

Yongwon Jeong

doi:10.1186/1687-4722-2013-7

Abstract

In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We apply probabilistic 2-mode analysis to speaker adaptation by representing each of the hidden Markov model mean vectors of training speakers as a matrix, and derive the speaker adaptation equation in the maximum a posteriori (MAP) framework. The adaptation equation becomes similar to the speaker adaptation equation using the MAP linear regression adaptation. In the experiments, the adapted models based on probabilistic 2-mode analysis showed performance improvement over the adapted models based on Tucker decomposition, which is a representative multilinear decomposition technique, for small amounts of adaptation data while maintaining good performance for large amounts of adaptation data.

Highlights

In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation
Speaker adaptation based on tensor analysis using Tucker decomposition [4] was investigated in [5], where bases were constructed from the multilinear decomposition of a tensor that consisted of the HMM mean vectors of training speakers
We describe a speaker adaptation method using probabilistic 2-mode analysis, which is an application of probabilistic tensor analysis (PTA) [8] to the second-order tensor; PTA is an application of probabilistic principal component analysis (PCA) (PPCA) [9] to tensor objects

Summary

Introduction

In automatic speech recognition (ASR) systems using hidden Markov models (HMMs) [1], mismatches between the training and testing conditions lead to performance degradation. One of such mismatches results from speaker variation. The experiments showed that the proposed method further improved the performance of the speaker adaptation based on Tucker decomposition for small amounts of adaptation data.

Multilinear decomposition

Speaker adaptation using Tucker decomposition

Construction of probabilistic 2-mode model for speaker adaptation

Method

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Apr 11, 2013
Citations: 10	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Adaptation of Hidden Markov Models Using Model-as-Matrix Representation
Yongwon Jeong
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 20
Yongwon JeongYongwon Jeong
01 Oct 2012
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 20

Speaker adaptation using improved MAP estimation with small amount of adaptation data
Takuya Futagami ... Noboru Hayasaka
-
Takuya Futagami, et. al.Takuya Futagami ... Noboru Hayasaka
01 Oct 2013
01 Oct 2013

Speaker normalization and adaptation based on linear transformation
J Ishii ... M Tonomura
-
J Ishii, et. al.J Ishii ... M Tonomura
21 Apr 1997
21 Apr 1997

A fast algorithm for unsupervised incremental speaker adaptation
M Schussler ... F Gallwitz
-
M Schussler, et. al.M Schussler ... F Gallwitz
21 Apr 1997
21 Apr 1997

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing