Comparing performance of spectral distance measures and neural network methods for vowel recognition

Candace A Kamm,Lynn A Streeter,Yana Kane-Esrig,David J Burr

doi:10.1016/0885-2308(89)90012-0

Abstract

Neural networks were trained to classify single 20 ms frames of vowels using either perceptually-based spectral representations or LPC spectra as input. Classification performance was compared with performance of several distance measures using nearest-neighbor and mean-distance decision criteria. The non-network distance measures included LPC-residual and cepstral distance measures used in conventional automatic speech recognition systems, as well as a formant-based measure and a new elastic distance measure that explicitly corrects for the effects of spectral tilt. Using an optimal error rate criterion, vowels were discriminated best using the elastic distance measure with the perceptually-based spectrum. Neural networks with LPC spectra as input performed comparably to the better conventional distance measures. While the performance of networks trained with perceptually-based spectral inputs was poorer than that of networks trained with LPC spectra, the features represented by the hidden nodes of this network were more consistent with factors related to human vowel perception.

Full Text