Abstract
Typically, voice conversion systems just use one type of spectral feature to convert acoustical characteristics of one speaker to another speaker. In this paper, we first study four different spectral features. Then, we compare these features and choose two features that perform better than others. Our experiments showed that cepstral features are more suitable than all-pole features for clustering and all-pole features are better for the analysis/synthesis stages. Hence, we propose a new voice conversion algorithm that uses both cepstral and all-pole features in order to utilize their desired properties simultaneously. We have two ideas to utilize this feature combination strategy. Our first idea is to apply feature combination to classical Gaussian mixture models (GMM)-based voice conversion method. The second idea is to apply feature combination to dynamic kernel partial least square regression (DKPLS) method. Results of our evaluations show that our proposed methods outperform the modern voice conversion methods in terms of speech quality and speaker individuality. Our methods are also robust to limited training data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have