Abstract

This paper presents a study of two acoustic speaker adaptation techniques applied in the context of the subspace Gaussian mixture model (SGMM) for automatic speech recognition (ASR). First, a model space linear regression based approach is presented for adaptation of SGMM state projection vectors and is referred to as subspace vector adaptation (SVA). Second, an easy to implement realization of constrained maximum likelihood linear regression (CMLLR) is presented for feature space adaptation in the SGMM. Numerically stable procedures for row-by-row estimation of the regression based transformation matrices are presented for both SVA and CMLLR adaptation. These approaches are applied to SGMM models that are estimated using speaker adaptive training (SAT), a technique for estimating more compact speaker independent acoustic models. Unsupervised speaker adaptation performance is evaluated on conversational and read speech task domains and compared to unsupervised adaptation performance obtained using the hidden Markov model-Gaussian mixture model (HMM-GMM) in ASR. It is shown that the feature space and model space adaptation approaches applied to the SGMM provide complementary reductions in word error rate (WER) and provide lower WERs than that obtained using CMLLR adaptation for the HMM-GMM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call