Voice conversion based on feature combination with limited training data

Mostafa Ghorbandoost,Abolghasem Sayadiyan,Mohsen Ahangar,Hamid Sheikhzadeh,Abdoreza Sabzi Shahrebabaki,Jamal Amini

doi:10.1016/j.specom.2014.12.004

Mostafa Ghorbandoost, Abolghasem Sayadiyan + Show 4 more

https://doi.org/10.1016/j.specom.2014.12.004

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Typically, voice conversion systems just use one type of spectral feature to convert acoustical characteristics of one speaker to another speaker. In this paper, we first study four different spectral features. Then, we compare these features and choose two features that perform better than others. Our experiments showed that cepstral features are more suitable than all-pole features for clustering and all-pole features are better for the analysis/synthesis stages. Hence, we propose a new voice conversion algorithm that uses both cepstral and all-pole features in order to utilize their desired properties simultaneously. We have two ideas to utilize this feature combination strategy. Our first idea is to apply feature combination to classical Gaussian mixture models (GMM)-based voice conversion method. The second idea is to apply feature combination to dynamic kernel partial least square regression (DKPLS) method. Results of our evaluations show that our proposed methods outperform the modern voice conversion methods in terms of speech quality and speaker individuality. Our methods are also robust to limited training data.

Full Text