Noise-robust voice conversion using adversarial training with multi-feature decoupling

Lele Chen,Xiongwei Zhang,Yihao Li,Meng Sun

doi:10.1016/j.engappai.2023.107807

Abstract

Most existing voice conversion methods focus primarily on separating speech content from speaker information while overlooking the decoupling of pitch information. Additionally, the quality of converted speech significantly degrades when the speech of the target speaker is contaminated by noises. To address these issues, this paper proposes a noise-robust voice conversion model with multi-feature decoupling based on adversarial training. The proposed framework utilizes three distinct encoders to encode speech content, speaker identity, and pitch information independently, which aims to enhance the performance of decoupling by minimizing their mutual information and reduce the correlations between feature vectors. Moreover, a gradient reversal layer and a noise decoupling discriminator are incorporated into the framework, which extracts noise-resistant speaker representations and content representations through adversarial training to facilitate the synthesis of high-quality speech. In order to optimize the learning process, a training strategy is developed which involves alternating between clean and noisy data during the training of the encoder. This strategy effectively guides and expedites the convergence of the model. Experimental results demonstrate that compared to the state-of-the-art baselines of noise-robust voice conversion, the proposed model achieves improvements around 0.31 and 0.39 in terms of speech naturalness and speaker similarity evaluation metrics, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Noise-robust voice conversion using adversarial training with multi-feature decoupling

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Journal: Engineering Applications of Artificial Intelligence	Publication Date: Jan 12, 2024
Citations: 1

Similar Papers

W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision
Hao Huang ... Liang He
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Hao Huang, et. al.Hao Huang ... Liang He
28 Oct 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Xulong Zhang ... Jing Xiao
-
Xulong Zhang, et. al.Xulong Zhang ... Jing Xiao
01 Dec 2022
01 Dec 2022

STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.
Yinghao Aaron Li ... Cong Han
SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology | VOL. 2022
Yinghao Aaron Li, et. al.Yinghao Aaron Li ... Cong Han
09 Jan 2023
SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology | VOL. 2022

One-Shot Voice Conversion Algorithm Based on Representations Separation
Chunhui Deng ... Ying Chen
IEEE Access | VOL. 8
Chunhui Deng, et. al.Chunhui Deng ... Ying Chen
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Noise-robust voice conversion using adversarial training with multi-feature decoupling

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence