Abstract

In this paper, we present a Pali (Thai) speech synthesis system using the parametric statistical approach. To develop the system, we recorded 40 Pali chants. Data were extracted and represented by the Mel frequency cepstral coefficients and fundamental frequency (F0), and labeled by force alignment. These parameters were modeled using the hidden Markov model (HMM). To generate synthesized speech, the input text was converted into context-dependent phonemes and generated speech parameters from the trained HMM model. The resulting parameters were used for synthesizing speech using a speech vocoder. In the study, we modeled two speech synthesized models: the first model represents tone in syllable levels (tone-syllable) and the second model represents tone in phoneme levels (tone-phoneme). To evaluate the naturalness of the proposed system, we asked 13 users to participate in listening tests comparing the two synthesized speech models (tone-syllable and tone-phoneme models) and original speech. The results, expressing naturalness in mean opinion score (MOS), were 4.21, 3.25, and 3.32 (from 5) for the original, tone-syllable, and tone-phoneme synthesized speeches, respectively. We also conducted an objective test in which we calculated the cepstral distance between the cepstral coefficients of the original speeches and synthesized speeches. The average distances were 3.67 and 3.60 for the tone-syllable and the tone-phoneme models, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call