Abstract

A new method of estimating the overall vocal-tract (VT) length and the normalization of acoustic parameters of different speakers is reported in this paper for acoustic-to-articulatory mapping. The main goal of this work was a high accuracy of VT length estimation from a short speech utterance. An articulatory model, originally developed by Maeda, was used as a reference female VT. Linear scaling was used to synthesize training data for VT lengths between 100% and 125% of the reference VT length (14.96 cm). These data had 250 utterances, resulted from different VT lengths, each containing six vowels. A neural network with two hidden layers was trained using vectors of 10 mel-frequency cepstrum coefficients and the corresponding VT lengths of these utterances. For the same VT length range, similar test data were synthesized using the training vowels but in different contexts. With the trained network, evaluation of this method on test data has shown an average error of less than 1% and a maximum error of 3.2% in estimating VT length from single test utterances. Frequency warping was used to normalize the cepstrum parameters according to estimated length factors ranging between 1.0 and 1.25. [This work was supported by NSERC.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call