Speech categorization using recurrent networks

Sven Anderson,Robert Port,John Merrill

doi:10.1121/1.2025398

Abstract

Several connectionist networks were trained to classify the English syllables ba, da, ga, pa, ta, ka collected from two male and two female speakers. Using a speech preprocessor, perceptually based spectral patterns were computed [H. Hermansky, Proc. ICASSP 87, 1159–1162 (1987)] every 5 ms. A sequential network having a limited class of recurrent connections [M. Jordan, ICS Tech. Rep. University of California at San Diego (1986)) was employed to categorize the data. Training by back propagation or second-order back propagation, a linear increase in the certainty of classification over the course of the syllable was required. Performance of the sequential networks was evaluated on both “known” and “unknown” speakers. When tested on novel tokens of a known speaker, the sequential network did very well as oppsed to very poorly on tokens from an unknown speaker. Sequential networks trained with back propagation are capable of integrating cues distributed over time and using them to categorize data. However, their learned behavior may not generalize to data from other speakers. [Work supported by NSF.]

Full Text