Parallel voice conversion with limited training data using stochastic variational deep kernel learning

Mohamadreza Jafaryani,Hamid Sheikhzadeh,Vahid Pourahmadi

doi:10.1016/j.engappai.2022.105279

Mohamadreza Jafaryani, Hamid Sheikhzadeh + Show 1 more

Open Access

https://doi.org/10.1016/j.engappai.2022.105279

Copy DOI

Abstract

There are two types of voice conversion methods: statistical and deep learning-based. Although statistical methods can train with limited data, they face challenges, including spectral oversmoothing and time-domain discontinuity. On the other hand, extensively researched deep learning-based methods rely primarily on massive amounts of data, which limits their practical applicability. Given that voice conversion is an engineering problem with limited training data, it is crucial to develop techniques that can produce satisfactory results in terms of quality and similarity in the absence of a large amount of data.This paper proposes a voice conversion model based on stochastic variational deep kernel learning (SVDKL), which works with limited training data. The model allows the use of both the deep neural network’s expressive capability and the high flexibility of the Gaussian process, which is a Bayesian and non-parametric method. The model utilizes a cascade of a deep neural network and a conventional kernel as the covariance function, which enables it to estimate non-smooth and more complex functions. Furthermore, the model’s sparse variational Gaussian process solves the scalability problem of exact inference and enables the learning of a global mapping function for the entire acoustic space. One of the most important aspects of the proposed scheme is that the model parameters are trained using marginal likelihood optimization, which takes into account both data fitting and model complexity. Considering model complexity reduces the training data by increasing the robustness to overfitting. To evaluate the proposed scheme, we examined the model’s performance with as little as approximately 80 s of training data. The results indicated that our method obtains a higher mean opinion score, smaller spectral distortion, and better preference tests than the state-of-the-art limited data methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel voice conversion with limited training data using stochastic variational deep kernel learning

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Journal: Engineering Applications of Artificial Intelligence	Publication Date: Aug 11, 2022
Citations: 4

Similar Papers

Japanese pitch conversion for voice morphing based on differential modeling
Ryuki Tachibana ... Masafumi Nishimura
-
Ryuki Tachibana, et. al.Ryuki Tachibana ... Masafumi Nishimura
06 Sep 2009
06 Sep 2009

Regularizing Generative Adversarial Networks under Limited Data
Hung-Yu Tseng ... Lu Jiang
-
Hung-Yu Tseng, et. al.Hung-Yu Tseng ... Lu Jiang
01 Jun 2021
01 Jun 2021

Error Reduction Network for DBLSTM-based Voice Conversion
Mingyang Zhang ... Haizhou Li
-
Mingyang Zhang, et. al.Mingyang Zhang ... Haizhou Li
01 Nov 2018
01 Nov 2018

Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
Ning Xu ... Zhen Yang
Speech Communication | VOL. 58
Ning Xu, et. al.Ning Xu ... Zhen Yang
26 Nov 2013
Speech Communication | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel voice conversion with limited training data using stochastic variational deep kernel learning

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence