Word boundary detection in broad class and phoneme strings

Jonathan Harrington

doi:10.1016/0885-2308(89)90004-1

Abstract

This paper explores the number of word boundaries which can be detected from sequences of phonemes and broad classes in continuous speech transcriptions. In the first part of the paper, word boundaries are detected from sequences of three phonemes which occur across word boundaries but which are excluded word internally. When such sequences are matched against phonemic transcriptions of 145 utterances, it is shown that around 37% of all word boundaries can be correctly identified. When the same transcriptions are represented by broad classes rather than phonemes, a knowledge of sequences which span word boundaries but which do not occur word internally is almost completely ineffective for the purpose of word boundary detection. Instead, it is shown that a version of the model discussed in Cutler & Norris 1988 based on the distinction between “strong” and “weak” vowels enables over 40% of word boundaries to be correctly located at the broad class level although many word boundaries are also inserted at inappropriate points. The implications of these kinds of word boundary detection strategies for models of lexical access in a continuous speech recognizer are also discussed.

Full Text