Deterministic annealing EM algorithm in parameter estimation for acoustic model

Yohei Itaya,Yoshihiko Nankaku,Heiga Zen,Keiichi Tokuda,Tadashi Kitamura,Chiyomi Miyajima

doi:10.21437/interspeech.2004-176

Abstract

ABSTRACT This paper investigates the effectiveness of the DAEM (Determin-istic Annealing EM) algorithm in acoustic modeling for speakerand speech recognition. Although the EM algorithm has beenwidely used to approximate the ML estimates, it has the problemof initialization dependence. To relax this problem, the DAEMalgorithm has been proposed and conﬁrmed the effectiveness insmall tasks. In this paper, we applied the DAEM algorithm tospeakerrecognitionbasedonGMMsandcontinuousspeechrecog-nitionbasedonHMMs. ExperimentalresultsshowthattheDAEMalgorithm can improve the recognition performance as comparedtotheordinaryEMalgorithmwithconventionalinitializationmeth-ods,especiallyintheﬂatstarttrainingforcontinuousspeechrecog-nition. 1. INTRODUCTION The EM (Expectation-Maximization) algorithm [1] is widelyused for parameter estimation of statistical models with hiddenvariables. This algorithm provides a simple iterative proceduretoobtainapproximateML(maximumlikelihood)estimates. How-ever, since the EM algorithm is a hill-climbing approach, it suffersfrom the local maxima problem.On the other hand, GMMs (Gaussian mixture models) [2] andHMMs (hidden Markov models) [3] have been commonly usedin acoustic modeling for speaker and speech recognition, respec-tively. In conventional approaches, the LBG algorithm for GMMsand the segmental k-means algorithm for HMMs have been em-ployed to obtain initial model parameters before applying the EMalgorithm. However these initial values are not guaranteed to benear the true maximum likelihood point, and the posterior den-sity becomes unreliable at an early stage of training. Especiallyin continuous speech recognition, it is difﬁcult to obtain accuratephoneme boundaries for all training data. Hence, the embeddedtraininghasbeenusedinwhichphonemeboundariesarealsodealtas hidden variables, and estimated based on the EM algorithm.Furthermore, in the worse case that the boundary information isnot available, a method called the ﬂat start training is often ap-plied. In this method, initial parameters of HMMs are given bymaking all states of all models equal, and then carry out the em-bedded training. In these situations, we do not have enough priorknowledge to obtain a good initial values for the EM algorithm,and it would converge to one of the local maxima or saddle points

Full Text