Abstract

This paper proposed a novel algorithm for Chinese voice conversion based on phonetic Gaussian mixture model. The proposed method implemented spectral feature conversion for each category phoneme based on phonetic Gaussian mixture model, which prevented the spectral smoothing of traditional Gaussian mixture model (GMM) and avoided phoneme imbalance between training and testing materials in order to improve voice intelligibility and naturalness. Furthermore, the modification of pitch was achieved by manipulating the linear prediction-residual with the help of the knowledge of instants of significant excitation in order to improve the quality of synthesis speech. First, similarity to the target voice spectral was evaluated in an objective test and it was shown that the proposed algorithm improved similarity by 9.31% compared with GMM. In subjective listening test, an ABX test was performed and the proposed algorithm was preferred over the baseline algorithm by 10.36%, and improved quality by 29.33% in terms of mean opinion score (MOS).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call