Abstract

In this paper we introduce a mixed excitation model into an HMM-based speech synthesis system with the objective of improving the quality of synthesized speech. In previous work we have proposed a text-to-speech synthesis system that synthesized speech by outputting speech parameters using HMMs that model Mel frequency ceptral coefficients, fundamental frequencies, and duration. In that system we used a simple model as the excitation source model for exciting the synthesis filter (an MLSA filter) whereby we switched between a pulse sequence and white noise for intervals of voiced and unvoiced speech, respectively. When using that type of excitation model, it is not possible to synthesize speech such as voiced fricatives that contains both a periodic and an aperiodic component, and this is a cause of poor synthesized speech quality. Therefore, in this paper we incorporate a mixed excitation model based on a narrowband vocoding method MELP that combines a pulse stream with white noise with a view to realizing high-quality speech synthesis. Since this excitation model can be applied to wideband vocoding as well as narrowband, we anticipate that it will prove effective for speech synthesis. In addition, we introduce a widely used vocoding method, a postfilter, in order to improve the quality of the synthesized speech. In addition, the results of subjective evaluation show the effectiveness of the mixed excitation model and postfilter in this system. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(12): 43–50, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20354

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.