Automatic Language Identification Using Ergodic HMM

S.A Santoshkumar,V Ramasubramanian

doi:10.1109/icassp.2005.1415187

Abstract

Recently, we established the equivalence of an ergodic HMM (EHMM) to a parallel sub-word recognition (PSWR) framework for language identification (LID). The states of EHMM correspond to acoustic units of a language and its state-transitions represent the bigram language model of unit sequences. We consider two alternatives to represent the state-observation densities of EHMM, namely, the Gaussian mixture model (GMM) and hidden Markov model (HMM). We present a segmental K-means algorithm for the training of both these types of EHMM (EHMM of GMM and EHMM of HMM) and compare their performance on a 6 language LID task in the OGI-TS database. EHMM of GMM has a performance comparable to PSWR and superior to EHMM of HMM; we provide reasons for the performance difference between EHMM(G) and EHMM(H), and identify ways of enhancing the performance of EHMM(H) which is a novel and powerful architecture, ideal for spoken language modeling.

Full Text