Abstract

The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.

Highlights

  • The modern speech synthesis system has a wide variety of applications

  • An emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm

  • In order to evaluate the performance of proposed emotional speech synthesis system, a subjective test is made

Read more

Summary

Introduction

The modern speech synthesis system has a wide variety of applications. In the call-centers, the speech synthesizer could conduct dialogues with customers. The majority of modern speech synthesizers could produce voice (acoustic waveform) from text. The emotional speech synthesis aims to add human emotions into synthesized speech to produce more natural affective speech. Two major approaches to emotional speech synthesis dominate the literature: formant synthesis and concatenative synthesis [1]. In order to produce variety of emotions, the system requires a larger size of speech database to build a selecting units pool [6,7,8,9]. To solve this problem, several researchers incorporate prosodic strategies into unit selection [10,11]. An emotional speech synthesis system is proposed based on prosodic feature modification and TS-PSOLA concatenative synthesis method

Emotional Speech Synthesis System
Speech Database
Calculation of Fundamental Frequency
Calculation of Energy
Calculation of Time Duration
TS-PSOLA Method
Experiments and Results
Conclusions and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call