Abstract

Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0) contours accurately to preserve the speech prosody to preserve the quality of speech prosody. Approach: This study presents a comparison of two successful F0 models. One approach is based on the Fujisaki

Highlights

  • The Fujisaki’s model has been applied within a speaker-independent system as extended modules. It has been exploited in the modeling of Thai expressive speech; i.e., sad, happy, angry styles (Chomphan and Kobayashi, 2008; 2009)

  • Another study has been conducted by using a structural model which is based on the assumption that the behavioral characteristics of vocal-fold elongation in vibration could be approximated by those of a simple forced vibrating system (Ni and Hirose, 2006; Chomphan and Kobayashi, 2009)

  • The RMS error calculation has been done for evaluation the modeling performance for both mentioned speech models and for all speech styles including angry style, sad style, enjoyable style and

Read more

Summary

Introduction

The Fujisaki’s model has been applied within a speaker-independent system as extended modules It has been exploited in the modeling of Thai expressive speech; i.e., sad, happy, angry styles (Chomphan and Kobayashi, 2008; 2009). To find the optimal representative parameters, optimization is carried out by minimizing the mean squared error in the Ln F0(t) domain through the hillclimbing search in the space of model parameters (Seresangtakul and Takara, 2003). To use this model, the parameters are extracted from the speech database, utterance by utterance.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call