Detection of phonological features in continuous speech using neural networks

Simon King,Paul Taylor

doi:10.1006/csla.2000.0148

Abstract

We report work on the first component of a two-stage speech recognition architecture based onphonological features rather than phones. This paper reports experiments on three phonological feature systems: (1) the Sound Pattern of English (SPE) system which uses binary features, (2) amulti-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and (3)Government Phonology (GP) which uses a set of structured primes. All experiments used recurrent neural networks to perform feature detection. In these networks the input layer is a standard framewise cepstral representation, and the output layer represents the values of the features. The system effectively produces a representation of the most likely phonological features for each input frame. All experiments were carried out on the TIMIT speaker-independent database. The networks performed well in all cases, with the average accuracy for a single feature ranging from 86% and 93%. We describe these experiments in detail, and discuss the justification and potential advantages of using phonological features rather than phones for the basis of speech recognition.

Full Text