Abstract

Recently, we established the equivalence of an ergodic HMM (EHMM) to a parallel sub-word recognition (PSWR) framework for language identification (LID). The states of EHMM correspond to acoustic units of a language and its state-transitions represent the bigram language model of unit sequences. We consider two alternatives to represent the state-observation densities of EHMM, namely, the Gaussian mixture model (GMM) and hidden Markov model (HMM). We present a segmental K-means algorithm for the training of both these types of EHMM (EHMM of GMM and EHMM of HMM) and compare their performance on a 6 language LID task in the OGI-TS database. EHMM of GMM has a performance comparable to PSWR and superior to EHMM of HMM; we provide reasons for the performance difference between EHMM(G) and EHMM(H), and identify ways of enhancing the performance of EHMM(H) which is a novel and powerful architecture, ideal for spoken language modeling.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call