A noise-robust voice conversion method with controllable background sounds

Lele Chen,Meng Sun,Weiwei Chen,Yihao Li,Xiongwei Zhang

doi:10.1007/s40747-024-01375-6

Abstract

Background noises are usually treated as redundant or even harmful to voice conversion. Therefore, when converting noisy speech, a pretrained module of speech separation is usually deployed to estimate clean speech prior to the conversion. However, this can lead to speech distortion due to the mismatch between the separation module and the conversion one. In this paper, a noise-robust voice conversion model is proposed, where a user can choose to retain or to remove the background sounds freely. Firstly, a speech separation module with a dual-decoder structure is proposed, where two decoders decode the denoised speech and the background sounds, respectively. A bridge module is used to capture the interactions between the denoised speech and the background sounds in parallel layers through information exchanging. Subsequently, a voice conversion module with multiple encoders to convert the estimated clean speech from the speech separation model. Finally, the speech separation and voice conversion module are jointly trained using a loss function combining cycle loss and mutual information loss, aiming to improve the decoupling efficacy among speech contents, pitch, and speaker identity. Experimental results show that the proposed model obtains significant improvements in both subjective and objective evaluation metrics compared with the existing baselines. The speech naturalness and speaker similarity of the converted speech are 3.47 and 3.43, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complex & Intelligent Systems	Publication Date: Feb 29, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A noise-robust voice conversion method with controllable background sounds

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Similar Papers

Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion
Chao Xie ... Patrick Lumban Tobing
-
Chao Xie, et. al.Chao Xie ... Patrick Lumban Tobing
23 May 2022
23 May 2022

MASS: Multi-task anthropomorphic speech synthesis framework
Jinyin Chen ... Zhaoyan Ming
Computer Speech & Language | VOL. 70
Jinyin Chen, et. al.Jinyin Chen ... Zhaoyan Ming
21 May 2021
Computer Speech & Language | VOL. 70

GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion.
Sandipan Dhar ... Swagatam Das
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP
Sandipan Dhar, et. al.Sandipan Dhar ... Swagatam Das
01 Jan 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
Chunxi Wang ... Xinfeng Zhang
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Chunxi Wang, et. al.Chunxi Wang ... Xinfeng Zhang
12 Oct 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A noise-robust voice conversion method with controllable background sounds

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems