A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features

Don X. Sun,Li Deng

doi:10.1121/1.409839

Abstract

In recent years, the development of a feature-based general statistical framework has been pursued for automatic speech recognition via novel designs of minimal or atomic units of speech, aiming at a parsimonious scheme to share the interword and interphone speech data and at a unified way to account for the context-dependent behaviors in speech. The basic design philosophy has been motivated by the theory of distinctive features and by a new form of phonology which argues for use of multidimensional articulatory structures. In this paper, the most recently developed feature-based recognizer is presented, which is capable of operating on all classes of English sounds. Detailed descriptions of the design considerations for the recognizer and of key aspects of the design process are provided. This process, which is called lexicon ‘‘compilation,’’ consists of three elements (1) establishing a feature-specification system; (2) constructing a probabilistic and fractional temporal overlapping pattern across the features; and (3) mapping from the feature-overlap pattern to a state-transition graph. A standard phonetic classification task from the TIMIT database is used as a test bed to evaluate the performance of the recognizer. The experimental results provide preliminary evidence for the effectiveness of the feature-based approach to speech recognition.

Full Text