Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Xueru Zhang,Kris Demuynck,Hugo Van Hamme

doi:10.1016/j.specom.2013.05.001

Abstract

A novel speaker adaptation algorithm based on Gaussian mixture weight adaptation is described. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These latent vectors encode the distinctive systematic patterns of Gaussian usage observed when modeling the individual speakers that make up the training data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of latent vectors reduces the number of parameters that must be estimated from the enrollment data. The resulting fast adaptation algorithm, using 3s of enrollment data only, achieves similar performance as fMLLR adapting on 100+s of data. In order to learn richer Gaussian usage patterns from the training data, the NMF-based weight adaptation is combined with vocal tract length normalization (VTLN) and speaker adaptive training (SAT), or with a simple Gaussian exponentiation scheme that lowers the dynamic range of the Gaussian likelihoods. Evaluation on the Wall Street Journal tasks shows a 5% relative word error rate (WER) reduction over the speaker independent recognition system which already incorporates VTLN. The WER can be lowered further by combining weight adaptation with Gaussian mean adaptation by means of eigenvoice speaker adaptation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: May 16, 2013
Citations: 23

Similar Papers

Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization
Xueru Zhang ... Kris Demuynck
-
Xueru Zhang, et. al.Xueru Zhang ... Kris Demuynck
01 May 2011
01 May 2011

Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
Yan Huang ... Yifan Gong
-
Yan Huang, et. al.Yan Huang ... Yifan Gong
25 Oct 2020
25 Oct 2020

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang ... Chao Zhang
-
Yu Wang, et. al.Yu Wang ... Chao Zhang
02 Sep 2018
02 Sep 2018

On Reducing Harmonic and Sampling Distortion in Vocal Tract Length Normalization
Néstor Becerra Yoma ... Ignacio Catalán
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 21
Néstor Becerra Yoma, et. al.Néstor Becerra Yoma ... Ignacio Catalán
01 Jan 2013
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Abstract

Talk to us

Similar Papers

More From: Speech Communication