Abstract

Four classifiers were examined for their ability to identify nine monophthongal American English vowels. The classifiers, (1) Bayesian, (2) a standard back-propagation neural net with one hidden-layer, (3) a modified ellipse method, and (4) an automatic region-drawing method, operated on two-dimensional vowel representations. Additionally, three different types of two-dimensional data were evaluated; (a) (log F1, log F2), (b) (norm log F1, norm log F2) normalized by a sensory reference [Miller, J. Acoust. Soc. Am. 85, 2114–2134 (1989)], and (c) (x’,y’) of the auditory-perceptual space [Miller, J. Acoust. Soc. Am. 85, 2114–2134 (1989)]. Our corpus of 2304 vowels spoken in a CVC context by male and female talkers includes stress and speaking-rate variations [Fourakis, J. Acoust. Soc. Am. 90, 1816–1827 (1991)]. For each vowel utternace and each data type, a single two-dimensional data point is computed from the vocalic segment selected by Fourakis. Separate training (75% of the data) and testing (25%) subsets of the corpus were used in a jacknife procedure. In general, all the classifiers except the ellipse method performed similarly, and obtained the highest scores using (x’,y’) data. However, unless training and testing of the classifiers is restricted to vowels that are relatively steady-state, identification scores are less than ideal (<90%). [Work supported by NIDCD.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.