Automatic syllabic boundary extraction from connected speech

Andy Harsin

doi:10.1121/1.425334

Abstract

A computational algorithm is presented which locates syllabic boundaries in connected speech. Filtered, digitized speech is processed in two stages. The first stage is an auditory front end which filters the digitized speech into critical bandwidths. The critical bands are amplitude-scaled according to a standard frequency/amplitude function, then low-frequency amplitude modulations in the 2–30-Hz range are emphasized according to a perceptual modulation sensitivity function. Next, the critical bands are low-pass filtered at 100 Hz and decimated at a rate of 25:1. Then they are low-pass filtered at 100 Hz, again creating acoustic envelopes of the processed speech, one for each critical band. During the second stage of processing, an autocorrelation-based algorithm is run on the envelopes. The local minima of this algorithm are pooled across the critical-band envelopes, yielding syllabic boundaries. The utterances used to develop and test this algorithm were taken from the Harvard Phonetically Balanced Sentences. Currently, the algorithm places boundaries within a few tens of milliseconds of where a phonemic syllabification would place them. Work continues on fine-tuning the algorithm. Another goal is to compare the algorithm’s performance against human listeners’. [The author wishes to acknowledge the support of Lucent Technologies in conducting this project.]

Full Text