Abstract

Differences in vocal tract lengths (VTLs) in individual speakers cause variations in acoustic features of phonemes. In this paper, a simple method to estimate speaker-specific VTLs and to quantitatively evaluate some speaker-normalization effects of the VTLs is proposed. We employed accumulated means of formant trajectories to estimate the VTLs of speakers ranging from children to adults. For the formant estimation, the inverse-filter control (IFC) system was used. In the system, the decision of analysis order, which means number of formants to be estimated, is automated. Moreover, to evaluate the speaker-normalization effect of VTLs, we proposed the data reduction method, which can reasonably find dense areas of ellipses from distributions in the formant space. Using these ellipse areas, we evaluated the three normalization effects of VTLs: normalization by the mean of all VTLs as the standard, by speaker-categorical means of VTLs, and by individual VTLs. The area reduced from the standard area of the original data by 39.5% and 46.6% in the case of the categorical means and individual VTLs, respectively. As a result, our proposed method was used to provide a “normalized vowel map (NVM)” that visualizes universal vowel-distributions as a core image of linguistic information. Finally, we compared the estimated VTLs with those by another method based on magnetic resonance imaging (MRI) data, using the proposed methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call