Abstract

The present communication concerns the use of prosodic parameters in automatic speech recognition (ASR), i.e. the feasibility of automatically extracting prosodic information from a set of acoustic measurements done on the signal, and the incidence of integrating such information on the performance of ASR. Prosodic parameters include pauses and contrasts in pitch, duration and intensity between successive segments (mainly the vocalic parts). This notion is also extended to number of syllables and to ratios of voiced to unvoiced portions of the words. Part one introduces the various aspects of prosody (linguistic and non linguistic) and the main problems to be solved in automatically extracting linguistic messages conveyed by prosodic features. Part two deals with word level and lexical search: it presents work done (1) on the feasibility of word stress detection (primary stress, estimation of its magnitude, and evaluation of the complete word stress pattern) and (2) on the estimation of the amount of lexical constraints imposed by stress information in lexical search, completed by other suprasegmental information (number of syllables, word boundaries, ratios between voiced and unvoiced portion in the word, etc.). Part three deals with phrase and sentence levels and syntactic constraints provided by the automatic detection of word, phrase and sentence boundaries. Part four relates a number of miscellaneous uses at the phonemic level: phonetic segmentation, identification of the voicing feature of consonants, and estimation of the “segmental quality” of the underlying segments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call