Non-parallel training for voice conversion by maximum likelihood constrained adaptation

A Mouchtaris,P Mueller,J Van Der Spiegel

doi:10.1109/icassp.2004.1325907

A Mouchtaris, P Mueller + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/icassp.2004.1325907

Copy DOI

Export

Save

Cite

Publication Date: Nov 19, 2004

Citations: 35

Affiliation: University of Pennsylvania

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available.

Full Text