Abstract

Vocal emotions, as well as different speaking styles and speaker traits, are characterized by a complex interplay of multiple prosodic features. Natural sounding speech synthesis with the ability to control such paralinguistic aspects requires the manipulation of the corresponding prosodic features. With traditional concatenative speech synthesis it is easy to manipulate the “primary” prosodic features pitch, duration, and intensity, but it is very hard to individually control “secondary” prosodic features like phonation type, vocal tract length, articulatory precision and nasality. These secondary features can be controlled more directly with parametric synthesis methods. In the present study we analyze the ability of articulatory speech synthesis to control secondary prosodic features by rule. To this end, nine German words were re-synthesized with the software VocalTractLab 2.1 and then manipulated in different ways at the articulatory level to vary vocal tract length, articulatory precision and degree of nasality. Listening tests showed that most of the intended prosodic manipulations could be reliably identified with recognition rates between 77% and 96%. Only the manipulations to increase articulatory precision were hardly recognized. The results suggest that rule-based manipulations in articulatory synthesis are generally sufficient for the convincing synthesis of secondary prosodic features at the word level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call