Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody

Yawen Xue,Masato Akagi,Yasuhiro Hamada

doi:10.1109/apsipa.2016.7820690

Abstract

This paper proposes a system to convert neutral speech to emotional with controlled intensity of emotions. Most of previous researches considering synthesis of emotional voices used statistical or concatenative methods that can synthesize emotions in categorical emotional states such as joy, angry, sad, etc. While humans sometimes enhance or relieve emotional states and intensity during daily life, synthesized emotional speech in categories is not enough to describe these phenomena precisely. A dimensional approach which can represent emotion as a point in a dimensional space can express emotions with continuous intensity. Employing the dimensional approach to describe emotion, we conduct a three-layered model to estimate displacement of the acoustic features of the target emotional speech from that of source (neutral) speech and propose a rule-based conversion method to modify acoustic features of source (neutral) speech to synthesize the target emotional speech. To convert the source speech freely and easily, we introduce two methods to parameterize dynamic features in prosody, that is, Fujisaki model for f0 contour and target prediction model for power envelope. Evaluation results show that subjects can perceive intended emotion with satisfactory order of emotional intensity and naturalness. This fact means that this system not only has the ability to synthesize emotional speech in category but also can control the order of emotional intensity in dimensional space even in the same emotion category.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Emotional voice conversion system for multiple languages based on three-layered model in dimensional space
Yawen Xue ... Yasuhiro Hamada
The Journal of the Acoustical Society of America | VOL. 140
Yawen Xue, et. al.Yawen Xue ... Yasuhiro Hamada
01 Oct 2016
The Journal of the Acoustical Society of America | VOL. 140

Use of Emotional and Neutral Speech in Evaluating Compression Speeds.
Christopher Slugocki ... Francis Kuk
Journal of the American Academy of Audiology | VOL. 32
Christopher Slugocki, et. al.Christopher Slugocki ... Francis Kuk
01 Apr 2021
Journal of the American Academy of Audiology | VOL. 32

Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
C Busso ... Sungbok Lee
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17
C Busso, et. al.C Busso ... Sungbok Lee
01 May 2009
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17

Emotional Audio-Visual Speech Synthesis Based on PAD
Jia Jia ... Shen Zhang
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Jia Jia, et. al.Jia Jia ... Shen Zhang
01 Mar 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody

Abstract

Talk to us

Similar Papers