Segmentation of Speech into Syllables

Arthur N Stowe

doi:10.1121/1.2142549

Abstract

A computer program to segment speech into syllables is now a part of the Lincoln Laboratory speech-recognition system. It is intended to facilitate processing of polysyllabic words spoken in isolation, and, eventually, connected speech. The program does not attempt to fix boundaries precisely in time. Evaluation of the results is made with reference to the aural impression that a sequence of speech sounds makes on the investigator, the aim being to insure agreement between program and investigator for those segments at which the latter is certain that there is or is not boundary, and to confine disagreements to those segments at which the investigator is uncertain. Decisions are made at 3 levels, the easiest being made at the first level. Syllable boundaries are first marked at transitions from voiced to voiceless segments, voicing being determined according to the ratio of amplitude in a lowpass band to the total speech amplitude. Then the stretches of continuously voiced speech thus marked are processed. Dips in the over-all amplitude large enough to indicate unambiguously a syllable boundary are marked as boundaries. At the third level, the voiced segments defined by the boundaries thus far determined are further processed. More-detailed characteristics of the spectrum are used to make decisions in the most difficult cases, vowel-semivowel-vowel combinations and the monosyllabic diphthongs.

Full Text