VStyclone: Real-time Chinese voice style clone

Yichun Wu,Huihuang Zhao,Xiaoman Liang,Yaqi Sun

doi:10.1016/j.compeleceng.2022.108534

Abstract

This paper proposes a novel Chinese speech cloning model named VStyclone, which consists of three stages: multi-speaker training, target speaker encoding, and target speaker synthesis. In this work, we design an efficient tone extractor, which can reallocate resources to the sequences of log-mel spectrogram frames obtained from multiple speakers’ speech, thus allowing the network to learn multiple speakers’ features differently. This approach allows the network to focus more on the voice features of the target speaker and extract the target features accurately. To cluster the voices of the same speaker and sparse the voices of different speakers, we build an optimal softmax loss to optimize the model. Then, we develop a style synthesizer, which adopts the idea of transformer instead of recurrent neural network, so that the model can not only process text information in parallel, but also improve the model's ability to process long-distance contextual information. Meanwhile, we embed a style extraction module in the style synthesizer to dynamically capture style ranges in an unsupervised manner. In addition, the VStyclone model uses generative adversarial networks as the base generation model of the vocoder to improve the generation speed, which runs 1.2 times faster than the real-time generation speed on CPU, and finally the VStyclone model obtains the SOTA effect.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VStyclone: Real-time Chinese voice style clone

Abstract

Talk to us

Similar Papers

More From: Computers and Electrical Engineering

Lead the way for us

Journal: Computers and Electrical Engineering	Publication Date: Dec 20, 2022
Citations: 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VStyclone: Real-time Chinese voice style clone

Abstract

Talk to us

Similar Papers

More From: Computers and Electrical Engineering