Abstract
AbstractThis paper describes a technique for synthesizing speech with an arbitrary speaker's voice using speaker‐independent speech units, which we call “average voice”models. The proposed method is based on an HMM‐based text‐to‐speech synthesis system. In the HMM‐based speech synthesis system, the spectrum and the pitch parameter are modeled simultaneously using an HMM based on the multi‐space probability distribution (MSD). By appropriately transforming the HMM parameters, the voice characteristics and the prosodic features of synthesized speech can be transformed. In this paper, we derive an extension of the MLLR algorithm to apply it to MSD‐HMMs. It is shown that by speaker adaptation for the pitch and applying the spectrum model, not only the voice characteristics, but also the prosodic features, can be adapted by using a small number of sentences uttered by a target speaker. Through the subjective evaluation, it is shown that by applying simultaneous adaptation of pitch and spectrum, synthetic speech generated from adapted models from the average voice model using a small number of sentences is very close to that from speaker‐dependent models. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(11): 59–68, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10332
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.