Modelling acoustic parameters of prosody for read and acted-speech synthesis

Milan Rusko,Marián Trnka,Juraj Hamar,Richard Kováč,Sakhia Darjaa

doi:10.1121/1.2933269

Abstract

The prosody model is one of the most important parts of every speech synthesizer, influencing mainly its naturalness. The intonation contour and phoneme lengths (together with speech quality) bear a great deal of extralinguistic and paralinguistic information contained in the synthesized speech. The features reflecting personality, mood and emotions of the speaker are in strong interaction with those reflecting speech styles. Anyway the appropriate choose of prosody model and training material can make it possible to create special model for every speaking style. The paper presents our approach to modelling of acoustic parameters of prosody in two different speech styles in Slovak. Our model is based on Classification and regression trees (CARTs). It uses independent CART for phoneme lengths and three CARTs for fundamental frequency (F0) at the beginning, centre, and end of every syllable. Two hours of read speech were used for training a model of read speech. The recordings of a puppet player were used to train a model of acted speech. The models were implemented in the Kempelen 2.2 unit selection Slovak speech synthesizer. The listening tests have shown that the models are capable of modelling significant amount of the differences of the two speaking styles.

Full Text