Abstract

The length of the vocal tract and its relationship with formant frequencies is examined at fine temporal scales with the goal of providing accurate estimates of vocal tract length from acoustics on a spectrum-by-spectrum basis despite unknown articulatory information. Accurate vocal tract length estimation is motivated by applications to speaker normalization and biometrics. Analyses presented are both theoretical and empirical. Various theoretical models are used to predict the behavior of vocal tract resonances in the presence of different vocal tract lengths and constrictions. Real-time MRI with synchronized audio is also utilized for detailed measurements of vocal tract length and formant frequencies during running speech, facilitating the examination of short-time changes in vocal tract length and corresponding changes in formant frequencies, both within and across speakers. Previously proposed methods for estimating vocal tract length are placed within a coherent framework, and their effectiveness is evaluated and compared. A data-driven method for VTL estimation emerges as a natural extension of this framework, which is then developed and shown to empirically outperform previous methods on both synthetic and real speech data. A theoretical justification for the effectiveness of this new method is also explained. [Work supported by NIH.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call