Abstract
VOICEMIMICSYSTEMUSINGANARTICULATORYCODEBOOKFORESTIMATIONOFVOCALTRACTSHAPES. Chennoukh, D. Sinder, G. Richard* and J.L. FlanaganCenter for Computer Aids for Industrial Pro ductivity (CAIP), Rutgers University,Piscataway, NJ 08855-1390, USA*Matra-Communication, rue J.P. Timbaud, 78392 Bois d'Arcy,FranceTel.+1 908-445-0080, FAX: +1 908 445-4775, E-mail:chenoukh@caip.rutgers.eduABSTRACTVoice mimic systems using articulatory co deb o oks re-quireaninitialestimateofthevo caltractshap einthe vicinity of the global optimum.For this purp ose,we need to gather a large set of corresp onding articu-latory and acoustic data in the articulatory co deb o ok.Thus, searching and accessing the co deb o ok b ecomesa dicult task.In this pap er, the design of an artic-ulatory co deb o ok is presented where an acoustic net-work sub-samples the acoustic space such that vo caltract mo del shap es are ordered and clustered in thenetwork according toacousticparameters.Anotherissue addressed in this pap er concerns estimating thetra jectory of vo cal tract shap es as they change withtime.Sincetheinversemappingfromacousticpa-rameters to mo del shap e do es not have a unique so-lution, several vo cal tract shap e variations are p ossi-ble.Therefore, a dynamic optimization of tra jectorieshas b een develop ed.This optimization uses dynamicprop erties of each articulatory parameter to estimatethe next p osition.1.INTRODUCTIONThestudyofsp eechp erceptionandpro duc-tion has b een enhanced in the last two decades by thedevelopment of computers capable of large amountofcomputation.As a result, Stevens' study towards anarticulatorymo delforsp eechrecognition-synthesisb ecomesmorefeasiblethanitwasintheearlysix-ties([9]).However, an incomplete understanding ofsp eechpro ductionandtheacousticsofpre-ventedusfromachievingStevens'goal.Thegoalwastomimicinputsp eechsignalsbyrecognition-synthesis using a mo del of the vo cal tract area func-tionthatcanmimicthesp eechsignalswithoutun-derstanding their structure or meaning.An early attempt at creating a complete computersimulation of articulatory mo del sp eech co ding usingan optimization technique was rep orted by Flanaganetal.([4]).Thesimulationiscalled\voice mimic.The voice mimic attempts to provide an articulatorydescription of the vo cal tract that corresp onds to anarbitrary natural sp eech input and to generate a syn-thetic signal that, within p erceptual accuracy, dupli-catesthenaturalone.Centraltoe ortisinverse mapping from an acoustic signal to an articu-latory description.However, acoustic-to-articulatorymappings are non-unique and, given a cost function,the optimization techniques converge only to a lo calextremum that may b e near the vicinity of the initialparameters.Therefore, one needs to cho ose accuratestartup parameters to initialize the optimization pro-cedure.Schro eter andSondhi([8]),whocontinuedalong the same lines of Flanagan et al.'s study, usedan articulatory co deb o ok prop osed earlier byAtal etal.([1]).Since a co deb o ok is used to obtain the rstestimates of the vo cal tract shap e that may pro ducea given combination of acoustic parameters, it mustbedesignedsuchthatitspansthenatural articula-tory space of a sp eaker.Furthermore, sampling of thespace must b e ne enough so that an acoustic entryalways exists very close to the global optimum.Suchco deb o oks require a large set of matching pairs of vo-cal tract and acoustic parameters.The complexityofsearching a large co deb o ok for all p ossible vo cal tractmo del shap es b ecomes an issue.For this reason, thevoice mimic system needs, in addition to a go o d artic-ulatory co deb o ok, an ecient pro cedure for accessingthe co deb o ok ([6],[7]).The numb er and p osition of the co deb o ok vectorsa ect the p erformance of the voice mimic system ac-cording to two compromising problems.On one hand,increasing the size of the co deb o ok increases the dif- cultyoftheaccesstaskand,onotherhand,reductionofthissizecomplicatestheinverseprob-lemsolution.Inthesecond sectionofthispap er,anew design of the articulatory co deb o ok is presentedfor which the inversion of the articulatory-to-acousticmapping is pro cessed during the building of the co de-b o ok.Thisco deb o okdesignallowsreal-time accessto the set of acoustically equivalent shap es, regardlessthe size of the co deb o ok.Sincetheinversemappingfromacousticparam-eterstomo delshap edo esnothaveauniquesolu-tion,severalvo caltractshap eariationsarep ossi-ble.Schro eter and Sondhi([7]) prop osed the use ofdynamicprogrammingtoestimatetheoptimaltra-jectory of the vo cal tract mo del shap e variation path.The dynamic programming requires a delay of severaldata frames for the sp eech output ([8]).In the thirdsection, a metho d is prop osed where the articulatoryparameters are estimated within one frame.Section
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.