The integration of discriminative and maximum likelihood distance measures for a vowel articulation training aid

A Matthew Zimmer,Stephen A Zahorian

doi:10.1121/1.424478

Abstract

Improvements to the vowel articulation training aid described in a previous paper [Zimmer, Zahorian, and Auberg, J. Acoust. Soc. Am. 101, 3199(A) (1997)] have been made. The system uses a standard Windows 95/NT compatible sound card on a multimedia PC to provide continuous feedback about articulation for ten American English monopthong vowels in two modes: an F1/F2 style ‘‘ellipse’’ display and a vowel bargraph display. Neural network discriminative classifiers are used to produce the display outputs based on 12 features. Though testing showed that the system provided useful output for most voiced speech sounds, the network would classify some out-of-category sounds (e.g., nonvowel sounds, ambient noise) as belonging to one of the ten vowel categories. Experimental tests indicate that the inclusion of a generalized Euclidean distance measure to compare the feature values of an utterance with the average feature values for the vowel category specified by the neural network output help to greatly reduce the number of out-of-category sounds improperly classified as ‘‘correct’’ responses. The paper will describe the processing in more detail and summarize experimental results from vowel classification tests. [Funded by NSF, Grant No. NSF-BES-9411607.]

Full Text