Abstract

Two durational cues for prosody are examined for utterances in the Radio Speech Corpus. The acoustic cues are length of pause and speech rate associated with each syllabic segment. Pauses are defined as intervals where probability of voicing and energy (rms) fall below respective thresholds. Speech rate is defined as the reciprocal of the duration of the syllabic segment, without normalization with respect to speaker or segment identity. Each syllabic segment was assigned prosodic markers that are a combination of degree of accent and boundary, i.e., {unaccented, accented} × {nonboundary, intermediate boundary, intonational boundary}. Distributions of pause length and speech rate for each type of prosodic marker show that in general, mean pause length increases and speech rate decreases as the strength of the boundary increases from nonboundary to intonational boundary. Accented syllabic segments at intermediate or intonational boundaries showed longer associated pause length and slower speech rate than unaccented segments. Also, for different types of intonational boundary tones, mean pause length decreases in the order of: L-L%, H-H%, H-L%, L-H%. Preliminary classification results using the two durational cues show detection rates for intonational boundaries around 75%, with insertion rates around 25%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call