Abstract

A New Algorithm for Robust Sp eech Recognition: The Delta VectorTaylor Series ApproachPedroJ.Moreno and Brian Ebermanemail: pjm@crl.dec.com, bse@crl.deomDigital Equipment Corp orationCambridge Research Lab oratoryABSTRACTIn this pap er we present a new mo del-based comp ensationtechnique called Delta Vector Taylor Series (DVTS). Thisnew technique is an extension and improvementoer theVector Taylor Series (VTS) approach [7] that addressesseveral of its limitations .In particular, we presentanew statistical representation for the distribution of cleansp eech feature vectors based on a weighted vector co de-b o ok. This change to the underlying probabili ty densityfunction (PDF) allows us to pro duce more accurate andstable solutions for our algorithm. The algorithm is alsopresented in a EM-MAP framework where some the en-vironmental parameters are treated as random variableswith known PDF's. Finally,we explore a new comp ensa-tion approach based on the use of convex hulls.Weevaluate our algorithm in a phonetic classi cati on taskon the TIMIT [5] database and also in a small vo cabu-lary size sp eech recognition database. In b oth databasesarti cial and natural noise is injected at several signal tonoise ratios (SNR). The algorithm achieves matched p er-formance at all SNR's ab ove 10 dB.1.Intro ductionOver the last years several techniques have b een prop osedto deal with the problem of sp eech recognition in noisy en-vironments. Some of them such as PMC [3], or MLLR [6]have used the recognition engine and its rich statisticalrepresentation (more than 90,000 Gaussians in systemslike SPHINX-3 and HTK [9]) to mo del and comp ensatefor the e ects of the environment on sp eech recognitionsystems. Other techniques like CDCN [1] and POF [8]among others have used a reduced set of Gaussian mix-tures (typically 256 or less) to mo del the sp eechfeature vectors and prepro cess the noisy sp eech featuresvectors to e ectively clean the features b efore b eing pro-cessed by the recognition engine.The use of a rich statistical representation improves p er-formance, but has the drawback of using the whole sp eechrecognition engine with its asso ciated complexity.Anideal robust recognition technique should have the advan-tages of a rich statistical representation and at the sametime b eing simple and fast in its op eration.The Delta Vector Taylor Series (DVTS) approachis anattempt in this direction.It tries to gain the b ene tsof a rich statistical representation and a low complexitytechnique for robust sp eech recognition. It tries to achievethese goals by using a di erent statistical representationfor the sp eech feature vectors.The outline of the pap er is as follows. In section 2 wedescrib e the DVTS algorithm.In section 3 we brieydescrib e the necessary mo di cations to the algorithm tomakeit work as a lter.In section 4 we describ e ourexp erimental results and nally in section 5 we presentour conclusions.2.New Algorithm: Delta-VTSDVTS mo dels the sp eech feature vectors as a weightedsum of multidimensi onal Dirac deltasp(x)=M1Xk=0P[k])(1)where eachvector function(xk) is mo deled as(xk)=D1Yi=0ii;k)(2)P[k]is ana prioriprobability of observing a particulardelta. The sum of these probabili ties must add up to one.This novel representation of the PDF ofxhas several ad-vantages. First of all it greatly simpli es the mathemat-ical assumptions of the VTS [7] algorithm. It pro ducesa simple, fast, robust and direct formulation of the EMsolutions already presented in [7].In this pap er we assume a mo del of the environmentinwhich sp eech is corrupted by unknown additive stationarynoise and unknown linear lteringZ(!)=X)jH2+N(3)whereZ(!) represents the p ower sp ectrum of the de-graded sp eech,X(!) is the p ower sp ectrum of the cleansp eech,jH(!)2is the transfer function of the linear lter,andN(!) is the p ower sp ectrum of the additive noise.In the log-mel-sp ectral domain this can b e expressed asz=x+ log (exp (q) + exp (n))(4)or in more general termsz=x+f(;nq)(5)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call