Transformation of prosody in voice conversion

Berrak Sisman,Kay Chen Tan,Haizhou Li

doi:10.1109/apsipa.2017.8282288

Abstract

Voice Conversion (VC) aims to convert one's voice to sound like that of another. So far, most of the voice conversion frameworks mainly focus only on the conversion of spectrum. We note that speaker identity is also characterized by the prosody features such as fundamental frequency (F0), energy contour and duration. Motivated by this, we propose a framework that can perform F0, energy contour and duration conversion. In the traditional exemplar-based sparse representation approach to voice conversion, a general source-target dictionary of exemplars is constructed to establish the correspondence between source and target speakers. In this work, we propose a Phonetically Aware Sparse Representation of fundamental frequency and energy contour by using Continuous Wavelet Transform (CWT). Our idea is motivated by the facts that CWT decompositions of F0 and energy contours describe prosody patterns in different temporal scales and allow for effective prosody manipulation in speech synthesis. Furthermore, phonetically aware exemplars lead to better estimation of activation matrix, therefore, possibly better conversion of prosody. We also propose a phonetically aware duration conversion framework which takes into account both phone-level and sentence-level speaking rates. We report that the proposed prosody conversion outperforms the traditional prosody conversion techniques in both objective and subjective evaluations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformation of prosody in voice conversion

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion
Berrak Sisman ... Mingyang Zhang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27
Berrak Sisman, et. al.Berrak Sisman ... Mingyang Zhang
01 Jun 2019
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27

MASS: Multi-task anthropomorphic speech synthesis framework
Jinyin Chen ... Zhaoyan Ming
Computer Speech & Language | VOL. 70
Jinyin Chen, et. al.Jinyin Chen ... Zhaoyan Ming
21 May 2021
Computer Speech & Language | VOL. 70

Vowels and Prosody Contribution in Neural Network Based Voice Conversion Algorithm with Noisy Training Data
Olaide Ayodeji Agbolade
European Journal of Engineering Research and Science | VOL. 5
Olaide Ayodeji AgboladeOlaide Ayodeji Agbolade
05 Mar 2020
European Journal of Engineering Research and Science | VOL. 5

Vowels and Prosody Contribution in Neural Network Based Voice Conversion Algorithm with Noisy Training Data
Olaide Ayodeji Agbolade
European Journal of Engineering and Technology Research | VOL. 5
Olaide Ayodeji AgboladeOlaide Ayodeji Agbolade
05 Mar 2020
European Journal of Engineering and Technology Research | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformation of prosody in voice conversion

Abstract

Talk to us

Similar Papers