Abstract

The statistical approach to voice conversion typically consists of a feature conversion module followed by a vocoder. So far, the feature conversion studies are mainly focused on the conversion of spectrum. However, speaker identity is also characterized by prosodic features, such as fundamental frequency F0 and energy contour among others. In this paper, we study the transformation of speaker characteristics both in terms of spectrum and prosody. We propose two novel techniques that effectively use a limited amount of source-target training data and leverage a large general speech corpus to improve the voice conversion quality. First, we study the phonetic sparse representation under the group sparsity mathematical formulation. We use phonetic posteriorgrams PPGs together with spectral and prosody features to form tandem feature in the phonetic dictionary. The tandem feature allow us to estimate an activation matrix that is less dependent on source speakers, thus providing a better voice conversion quality. Second, we study the use of WaveNet vocoder that can be trained on general speech corpus from multiple speakers and adapted on target speaker data to improve the vocoding quality. We benefit from the large general speech databases that are used to train the PPG generator, and the WaveNet vocoder. The experiments show that the proposed conversion framework outperforms the traditional spectrum and prosody conversion techniques in both objective and subjective evaluations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.