Spectrum conversion using prosodic information

Ryo Mochizuki,Tadashi Okubo,Tetsunori Kobayashi

doi:10.1002/scj.20667

Abstract

AbstractFor speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information. However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(10): 12–20, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20667

Full Text