Abstract

Acoustic-to-articulatory inversion for vowels is performed by cepstral analysis-by-synthesis, using chain-matrix calculation of vocal tract (VT) acoustics and the Maeda articulatory model. The derivative of the VT chain matrix with respect to the area function was calculated in a novel efficient manner, and used in the BFGS quasi-Newton method for optimizing a distance measure between input and synthesized cepstral features over the entire articulatory trajectory. The optimization is initialized by a fast search of an articulatory codebook with a bin structure in formant space and the cost function also includes regularization and continuity terms to obtain realistic inverted VT shapes and smooth articulatory trajectories. Inversion is evaluated on the three diphthongs /ai/, /oi/ and /au/ of two speakers, one male and one female, from the University of Wisconsin X-ray microbeam (XRMB) database, and good agreement was achieved between inverted midsagittal vocal tract outlines and measured XRMB tongue and lip pellet positions, with an average relative error of less than 3% in the first three formants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.