Abstract
Articulatory speech synthesis based on aero-acoustic simulations of the vocal tract is computationally expensive and, therefore, requires simple yet precise models. Modeling the one-dimensional vocal tract area function directly instead of a higher dimensional vocal tract model is an efficient way to minimize the computational overhead of the simulations. In this paper, we propose a new parametric vocal tract model that is controlled by six points and capable of modeling a large variety of vocal tract shapes. We geometrically and perceptually evaluated the model on a set of 22 reference area functions corresponding to German vowels and consonants. The model was able to geometrically approximate the reference area functions with a minimum root-mean-square error of 0.302 cm$^2$, a maximum error of 1.142 cm$^2$, and a median error of 0.891 cm$^2$. After optimizations, a perceptual evaluation of the synthesis using our model in combination with a state-of-the-art aero-acoustic simulation achieved a vowel recognition rate of 90.7% and a consonant recognition rate of 73.2%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.