Unsupervised language model adaptation using latent semantic marginals

Yik-Cheung Tam,Tanja Schultz

doi:10.21437/interspeech.2006-573

Abstract

We integrated the Latent Dirichlet Allocation (LDA) approach, a latent semantic analysis model, into unsupervised language model adaptation framework. We adapted a background language model by minimizing the Kullback-Leibler divergence between the adapted model and the background model subject to a constraint that the marginalized unigram probability distribution of the adapted model is equal to the corresponding distribution estimated by the LDA model – the latent semantic marginals. We evaluated our approach on the RT04 Mandarin Broadcast News test set and experimented with different LM training settings. Results showed that our approach reduces the perplexity and the character error rates using supervised and unsupervised adaptation. Index Terms: unsupervised LM adaptation, LSA marginals, Latent Dirichlet Allocation, Mandarin Broadcast News

Full Text