Recognition strategies in a continuous speech understanding system

Joseph J Mariani

doi:10.1121/1.386269

Abstract

ESOPE0, the first version of our speech recognition system, uses a top‐down strategy from the pragmatic level to the phonetic one, and operates from left to right with a best‐few method and no‐backtracking. Dynamic comparison among the four best phoneme‐candidates is carried out. ESOPE1 uses the same basic strategy in a systematic way: A best‐few algorithm leads to a beam‐search procedure. ESOPE1‐1 employs a top‐down treatment down to the acoustic level with a diphone dictionary. It uses a dynamic comparison method at the acoustic level. In our automatic dictation project, using a natural language syntax and a 170 000‐form vocabulary, a bottom‐up, best‐few attitude has been taken to translate into words an error‐free continuous phoneme string. We therefore feel that severely limited language and poor phoneme recognition involve a top‐down strategy, whereas a bottom‐up strategy is preferable in the opposite situation. This, and the recent results in psycholinguistics, lead us, in our present elaboration of ESOPE2, to the use of both a top‐down, and a bottom‐up strategy (Prediction‐Verification‐Induction). Predictions are made at each level, but the recognized phonemes may introduce unpredicted words, to allow limited learning abilities.

Full Text