Rapid Speaker Adaptation Research Articles

Speaker-space-based speaker adaptation methods can obtain good performance even if the amount of adaptation data is limited. However, it is difficult to determine the optimal dimension and basis vectors of the subspace for a particular unknown speaker. Conventional methods, such as eigenvoice (EV) and reference speaker weighting (RSW), can only obtain a sub-optimal speaker subspace. In this paper, we present a new speaker-space-based speaker adaptation framework using compressive sensing. The mean vectors of all mixture components of a conventional Gaussian-Mixture-Model-Hidden-Markov-Model (GMM-HMM)-based speech recognition system are concatenated to form a supervector. The speaker adaptation problem is viewed as recovering the speaker-dependent supervector from limited speech signal observations. A redundant speaker dictionary is constructed by a combination of all the training speaker supervectors and the supervectors derived from the EV method. Given the adaptation data, the best subspace for a particular speaker is constructed in a maximum a posterior manner by selecting a proper set of items from this dictionary. Two algorithms, i.e. matching pursuit and l1 regularized optimization, are adapted to solve this problem. With an efficient redundant basis vector removal mechanism and an iterative updating of the speaker coordinate, the matching pursuit based speaker adaptation method is fast and efficient. The matching pursuit algorithm is greedy and sub-optimal, while direct optimization of the likelihood of the adaptation data with an explicit l1 regularization term can obtain better approximation of the unknown speaker model. The projected gradient optimization algorithm is adopted and a few iterations of the matching pursuit algorithm can provide a good initial value. Experimental results show that matching pursuit algorithm outperforms the conventional testing methods under all testing conditions. Better performance is obtained when direct l1 regularized optimization is applied. Both methods can select a proper mixed set of the eigenvoice and reference speaker supervectors automatically for estimation of the unknown speaker models.

Read full abstract

본 논문은 maximum a posteriori linear regression (MAPLR) 기반의 고속 화자적응 성능을 개선하기 위하여 사전분포를 추정하는 두 가지 방식을 제안한다. 일반적으로 MAPLR 방식에서 사용되는 변환행렬의 사전분포는 화자독립모델을 구성하는 훈련 화자들로부터 추정되어 모든 화자들에게 동등하게 적용된다. 본 논문에서는 새로운 화자에게 보다 더 적합한 사전분포를 적용하고자 적응 데이터를 이용하여 새로운 화자의 음향특성과 가까운 참조화자 집단을 선택한 후 참조화자 집단으로부터 사전분포를 추정하는 방법을 제안한다. 또한, 블록 대각 형태의 변환행렬의 사전분포를 추정하는 경우 사전분포의 평균행렬과 공분산행렬을 동일한 훈련 화자들로부터 얻어진 두 가지 형태의 변환행렬집단으로부터 각각 추정하는 방법을 제안한다. 제안된 방법의 성능 평가를 위하여 고립단어 인식실험을 통해 적응 단어의 개수에 따른 단어 인식률을 평가한다. 실험결과, 적응 단어 수가 매우 적을 때 기존의 MAPLR 방식에 비하여 통계적으로 유의미한 성능향상이 얻어짐을 보여준다. This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.

Read full abstract

Rapid Speaker Adaptation Research Articles

Related Topics

Articles published on Rapid Speaker Adaptation

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction

Rapid speaker adaptation using compressive sensing

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

Kernel Eigenvoices (Revisited) for Large-Vocabulary Speech Recognition

참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응

Evolutionary eigenvoice MLLR speaker adaptation

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

Speech separation using speaker-adapted eigenvoice speech models

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

Discriminative cluster adaptive training

Eigenvoice-based MAP adaptation within correlation subspace

Segmental eigenvoice with delicate eigenspace for improved speaker adaptation

Systematic speaker variation and within-speaker center of gravity correlations in the TIMIT database

Maximum a posteriori adaptation of HMM parameters based on speaker space projection

Rapid speaker adaptation using probabilistic principal component analysis

Discounted likelihood linear regression for rapid speaker adaptation

A comparison of novel techniques for rapid speaker adaptation

Rapid speaker adaptation in eigenvoice space

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Rapid Speaker Adaptation Research Articles

Related Topics

Articles published on Rapid Speaker Adaptation

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Unsupervised rapid speaker adaptation based on selective eigenvoice merging for user-specific voice interaction

Rapid speaker adaptation using compressive sensing

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

Kernel Eigenvoices (Revisited) for Large-Vocabulary Speech Recognition

참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응

Evolutionary eigenvoice MLLR speaker adaptation

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

Speech separation using speaker-adapted eigenvoice speech models

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

Discriminative cluster adaptive training

Eigenvoice-based MAP adaptation within correlation subspace

Segmental eigenvoice with delicate eigenspace for improved speaker adaptation

Systematic speaker variation and within-speaker center of gravity correlations in the TIMIT database

Maximum a posteriori adaptation of HMM parameters based on speaker space projection

Rapid speaker adaptation using probabilistic principal component analysis

Discounted likelihood linear regression for rapid speaker adaptation

A comparison of novel techniques for rapid speaker adaptation

Rapid speaker adaptation in eigenvoice space