Abstract

Parametric representations of the vocal tract area function have long been of interest in speech production research because of their close relationship with speech acoustics. Many representations are possible depending on which qualities are considered desirable. For instance, several representations have been developed which trade articulatory interpretability for low-dimensionality and other useful mathematical properties, such as orthogonality. It has not been well-established whether these abstract representations are related to inherent constraints on vocal tract deformation (e.g., articulatory subspaces), or whether these representations can achieve high accuracy with a small number of parameters when applied to area functions from large amounts of continuous speech. The present study approaches these issues by extracting a data-driven basis representation for vocal tract area functions, and subsequently demonstrating the relationship between that basis and previously proposed spatial Fourier bases. Analysis was performed on more than a quarter-million individual area functions extracted from the USC-TIMIT database using region-of-interest analysis. Results suggest that a spatial Fourier basis with only four harmonics provides over 90% accuracy, and that one additional basis function corresponding to labial aperture increases representation accuracy to near 100%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call