HMM-based Korean speech synthesis system for hand-held devices

Sang-Jin Kim,Minsoo Hahn,Jong-Jin Kim

doi:10.1109/tce.2006.273160

Abstract

Speech interface may be the first choice as a user interface for robots or hand-held devices such as personal digital assistants (PDAs) and portable multimedia players (PMPs). However, those devices have the limitation of the memory space and the computation power. The hidden Markov model (HMM)-based speech synthesis is presently considered to be suitable for the embedded systems. In this paper, our HMM-based Korean speech synthesis system is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, word phrase, and multilevel break strength. Mel-cepstrum and line spectrum pair (LSP) are compared for the spectrum modeling, and two-band excitation based on the harmonic plus noise speech model is utilized for the mixed excitation source. The developed small-size Korean synthesis system produced considerably high quality speech with a fairly good prosody

Full Text