Abstract

Intonational Phonology deals with the systematic way in which speakers effectively use pitch to add appropriate emphasis to the underlying string of words in an utterance. Two widely discussed aspects of pitch are the pitch accents and boundary events. These provide an insight into the sentence type, speaker attitude, linguistic background, and other aspects of prosodic form. The main hurdle, however, is the difficulty in getting annotations of these attributes in “real” speech. Besides being language independent, these attributes are known to be subjective and prone to high inter-annotator disagreements. Our investigations aim to automatically derive phonological aspects of intonation from large speech databases. Recurring and salient patterns in the pitch contours, observed jointly with an underlying linguistic context are automatically detected. Our computational framework unifies complementary paradigms such as the physiological Fujisaki model, Autosegmental Metrical phonology, and elegant pitch stylization, to automatically (i) discover phonologically atomic units to describe the pitch contours and (ii) build inventories of tones and long term trends appropriate for the given speech database, either large multi-speaker or single speaker databases, such as audiobooks. We successfully demonstrate the framework in expressive speech synthesis. There is also immense potential for the approach in speaker, style, and language characterization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call