Emotional Speech Synthesis Based on Prosodic Feature Modification

Ling He,Hua Huang,Margaret Lech

doi:10.4236/eng.2013.510b015

Abstract

The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.

Highlights

The modern speech synthesis system has a wide variety of applications
An emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm
In order to evaluate the performance of proposed emotional speech synthesis system, a subjective test is made

Summary

Introduction

The modern speech synthesis system has a wide variety of applications. In the call-centers, the speech synthesizer could conduct dialogues with customers. The majority of modern speech synthesizers could produce voice (acoustic waveform) from text. The emotional speech synthesis aims to add human emotions into synthesized speech to produce more natural affective speech. Two major approaches to emotional speech synthesis dominate the literature: formant synthesis and concatenative synthesis [1]. In order to produce variety of emotions, the system requires a larger size of speech database to build a selecting units pool [6,7,8,9]. To solve this problem, several researchers incorporate prosodic strategies into unit selection [10,11]. An emotional speech synthesis system is proposed based on prosodic feature modification and TS-PSOLA concatenative synthesis method

Emotional Speech Synthesis System

Speech Database

Calculation of Fundamental Frequency

Calculation of Energy

Calculation of Time Duration

TS-PSOLA Method

Experiments and Results

Conclusions and Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Engineering	Publication Date: Jan 1, 2013
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Emotional Speech Synthesis Based on Prosodic Feature Modification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Engineering

Lead the way for us

Similar Papers

High Quality Arabic Concatenative Speech Synthesis
Abdelkader Chabchoub
Signal & Image Processing : An International Journal | VOL. 2
Abdelkader ChabchoubAbdelkader Chabchoub
31 Dec 2011
Signal & Image Processing : An International Journal | VOL. 2

Detecting depression in speech: Comparison and combination between different speech types
Hailiang Long ... Zhenyu Liu
-
Hailiang Long, et. al.Hailiang Long ... Zhenyu Liu
01 Nov 2017
01 Nov 2017

Speech synthesis of emotions using vowel features of a speaker
Kanu Boku ... Masayoshi Tabuse
Artificial Life and Robotics | VOL. 19
Kanu Boku, et. al.Kanu Boku ... Masayoshi Tabuse
30 Oct 2013
Artificial Life and Robotics | VOL. 19

Emotional Speech Recognition Based on Lip-Reading
Elena Ryumina ... Denis Ivanko
-
Elena Ryumina, et. al.Elena Ryumina ... Denis Ivanko
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emotional Speech Synthesis Based on Prosodic Feature Modification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Engineering