Abstract
This paper proposed a novel algorithm for Chinese voice conversion based on phonetic Gaussian mixture model. The proposed method implemented spectral feature conversion for each category phoneme based on phonetic Gaussian mixture model, which prevented the spectral smoothing of traditional Gaussian mixture model (GMM) and avoided phoneme imbalance between training and testing materials in order to improve voice intelligibility and naturalness. Furthermore, the modification of pitch was achieved by manipulating the linear prediction-residual with the help of the knowledge of instants of significant excitation in order to improve the quality of synthesis speech. First, similarity to the target voice spectral was evaluated in an objective test and it was shown that the proposed algorithm improved similarity by 9.31% compared with GMM. In subjective listening test, an ABX test was performed and the proposed algorithm was preferred over the baseline algorithm by 10.36%, and improved quality by 29.33% in terms of mean opinion score (MOS).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.