GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion.

Sandipan Dhar,Nanda Dulal Jana,Swagatam Das

doi:10.1109/tnnls.2023.3335119

Abstract

Many-to-many voice conversion (VC) is a technique aimed at mapping speech features between multiple speakers during training and transferring the vocal characteristics of one source speaker to another target speaker, all while maintaining the content of the source speech unchanged. Existing research highlights a notable gap between the original and generated speech samples in terms of naturalness within many-to-many VC. Therefore, there is substantial room for improvement in achieving more natural-sounding speech samples for both parallel and nonparallel VC scenarios. In this study, we introduce a generative adversarial network (GAN) system with a guided loss (GLGAN-VC) designed to enhance many-to-many VC by focusing on architectural improvements and the integration of alternative loss functions. Our approach includes a pair-wise downsampling and upsampling (PDU) generator network for effective speech feature mapping (FM) in multidomain VC. In addition, we incorporate an FM loss to preserve content information and a residual connection (RC)-based discriminator network to improve learning. A guided loss (GL) function is introduced to efficiently capture differences in latent feature representations between source and target speakers, and an enhanced reconstruction loss is proposed for better contextual information preservation. We evaluate our model on various datasets, including VCC 2016, VCC 2018, VCC 2020, and an emotional speech dataset (ESD). Our results, based on both subjective and objective evaluation metrics, demonstrate that our model outperforms state-of-the-art (SOTA) many-to-many GAN-based VC models in terms of speech quality and speaker similarity in the generated speech samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Jan 1, 2023
Citations: 1

Similar Papers

An Adaptive-Learning-Based Generative Adversarial Network for One-to-One Voice Conversion
Sandipan Dhar ... Nanda Dulal Jana
IEEE Transactions on Artificial Intelligence | VOL. 4
Sandipan Dhar, et. al.Sandipan Dhar ... Nanda Dulal Jana
01 Feb 2023
IEEE Transactions on Artificial Intelligence | VOL. 4

Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network
Sandipan Dhar ... Swagatam Das
-
Sandipan Dhar, et. al.Sandipan Dhar ... Swagatam Das
04 Jun 2023
04 Jun 2023

Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
Mohammed Salah Al-Radhi ... Géza Németh
Applied Sciences | VOL. 11
Mohammed Salah Al-Radhi, et. al.Mohammed Salah Al-Radhi ... Géza Németh
15 Aug 2021
Applied Sciences | VOL. 11

Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
Tuan Vu Ho ... Masato Akagi
-
Tuan Vu Ho, et. al.Tuan Vu Ho ... Masato Akagi
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems