Abstract

This study compares the pattern of time-varying spectral change in a database of vowels spoken in hVd words by adults and children ranging from 5 to 18 years. Measurements of vowel formant frequencies (F1, F2, and F3), mean fundamental frequency, and duration were used to train a pattern classifier to determine the optimum sampling locations for the purpose of vowel classification. A series of linear discriminant analyzes was carried out, using leave-one-out cross validation to classify the test stimuli. These analyzes differed in the temporal location(s) at which the formant frequencies were measured and the number (1, 2, or 3) of sample points. For all age and sex classes, classification accuracy was higher when two samples were used rather than a single frame, with a mean increase in accuracy of 10.8%. Adding a third sample point produced marginal improvement in classification, with less than 1% change overall. The highest classification results were obtained when the initial sample was taken relatively early in the vowel (around the 20% point), while the second sample was taken around the 70% point, with relatively minor variations in classification scores across age and sex categories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call