Abstract

Fundamental Frequency (F0) conveys the prosodic information of the human speech. The modeling of the dialects’ F0 in a particular language is vital issue that should be taken into account. Four main dialects are spoken in different regions of Thailand including central, north, northeast and south regions. Another important issue is the environmental noises which is often be perceived in the daily life and causing the degradation in speech quality. The robustness of the F0 modeling techniques can be evaluated by studying the effects of noises for Thai dialects. The structural model has been chosen in this study. The four-type background noises with five different levels of power are applied in this study. The synthesized F0 from the structural model has been compared with the F0 from natural speech with different scenarios including noise types, noise levels speech dialects and speech genders. From the experimental results, the root mean square errors between the synthesized F0 and the natural F0 are calculated. When increasing the noise level, the root mean square error decreases. As for the different noise types, air-conditioner noise gives the highest level of root mean square error, while the train noise brings the lowest level of root mean square error. As for the different male speech dialects, center and northeast dialects are rather higher than those of north and south dialects. As for the different female speech dialects, north dialect has the smallest deviation among all dialects. As for the different genders, female speech give higher root mean square error than male speech for all types of noises and all power levels of noises. By using the structural model, the results confirm that all Thai dialects response the proposed model differently. Moreover, all four types of simulated noises deteriorate the F0 contours of all dialects differently.

Highlights

  • In the noisy environment, the effects of noises in various types are needed to be taken into account for human speech communication

  • As for the different noise types, air-conditioner noise gives the highest level of root mean square error, while the train noise brings the lowest level of root mean square error

  • As for the different female speech dialects, north dialect has the smallest deviation among all dialects

Read more

Summary

Introduction

The effects of noises in various types are needed to be taken into account for human speech communication. The modeling of speech F0 contour with noises causes the degradation in intelligibility and naturalness of the speech (Chomphan and Kobayashi, 2007a). It is important to study how the in the modern speech processing systems. Structural modeling of fundamental frequency for Thai tones conducted in 2012 shows the effectiveness for a limiteddomain speech tone corpus (Chomphan, 2012). It can be seen that the structural model parameters can be used to model Thai tones appropriately. Fujisaki’s modeling of F0 contours for Thai Dialects has been conducted by (Chomphan, noise degrades the model parameters and the resulted F0 2010b). In has been noted that Thai dialect speech corpus

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call