Abstract

Problem statement: In general, there are a number of rural dialects i n Thai. However, four dialects are mainly spoken by Thai people residing in four core region including central, north, northeast and south regions. Recognizing and synthe sizing Thai speech with different dialects are consequently difficult. Approach: Prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalne ss but also the intelligibility of speech. To treat the problem, the speech prosody is carefully preserved through modeling the fundamental frequency (F0) contours. The differences among the model parameters of four Thai dialects have been summarized. This study proposed an analysis of model parameters for Thai speech prosody with four regional dialects and two genders which is a preliminary wor k for speech recognition and synthesis. Fujisaki's modeling; a powerful tool to model the F0 contour h as been adopted. Seven derived parameters from the Fujisaki's model are as follows. The first para meter is baseline frequency which is the lowest lev el of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are p hrase command and tone command durations which reflect the speed of speaking and the length of a s yllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each regional dialect includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1600 utterances in total. The results show ed that most of the proposed parameters can distinguish four kinds of regional dialects explici tly. Conclusion: By using the Fujisaki's model, the results confirm that the proposed parameters can di stinguish the regional dialects efficiently. In the future research, they were expected to be applied i n the speech recognition and synthesis with various regional dialect characteristics.

Highlights

  • An appropriate modeling of F0 contour contributes the effectiveness in speech processing, such as speech recognition, speech synthesis and speech coding

  • As for speech processing of Thai dialects, it has not been studied despite of a variety of the dialects spreading over four regions of Thailand

  • Beginning from the Northern region of Thailand, Thai dialect of “Lanna” or “Kammuang” is widely used, Lao-style Thai dialect is spoken in the North Eastern region, South Thai dialect is spoken generally in the Southern part of Thailand

Read more

Summary

Introduction

An appropriate modeling of F0 contour contributes the effectiveness in speech processing, such as speech recognition, speech synthesis and speech coding. Fujisaki’s modeling of fundamental frequency for Thai expressive speech conducted in 2010 is proved to be effective for a limited-domain speech corpus (Chomphan, 2010). By using the same way of Thai expressive speech (Chomphan, 2010), the study proposes an analysis of F0 modeling of four Thai dialects including standard Thai, Lanna or North dialect, Lao-style or North East dialect and South dialect. The extension of Fujisaki’s model which is a preliminary study for the advanced research in speech synthesis and recognition such as the expressive speech synthesis work in Japanese language (Tachibana et al, 2005; 2006) is mainly used

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call