Intonation and prosody conversion for expressive mandarin speech synthesis

Jing Zhu,Yibiao Yu

doi:10.1109/icosp.2012.6491547

Abstract

Expressive speech synthesis has a wide variety of applications. Compared with general speech synthesis for Chinese, this paper focuses on prosody and intonation. Prosody is described from three aspects, accent, pause and speaking speed. Accent can be stressed by modifying fundamental frequency and amplitude. Pause is achieved by interpolating some frames which parameter value is zero. Speaking speed is controlled by copying or deleting some frames in specified location. Mandarin is a tonal language, so intonation is significant in the synthesis. There are four patterns of intonation, rising intonation, falling intonation, flat intonation and sinuate intonation. Use polynomial fitting function to model each intonation pattern. Apply the intonation model to convert one pattern to another. It can be seen from the experimental results, the proposed method can achieve a good quality on the conversion of tune and it can highly improve the naturalness of the speech.

Full Text