Abstract

Problem statement: The expressive speech of Thai had been studied for a short period of time. An important feature of speech was fundamental frequency (F0) which defines the human speech prosody. It could be used to distinguish the difference between several types of expressive speech. The environmental noises affect the F0 contour for Thai dialects as concluded in the previous study. The study prosodic information of Thai speech with various speaking styles and several types of noises had not been conducted. Approach: Four different types of speaking styles were used; meanwhile four types of environmental noises were recorded with different levels of power. They were subsequently mixed together. The F0 contours from different types of speaking styles, different types of noises and different levels of noises were extracted. The Root Mean Square Error (RMSE) between the F0 contour of clean speech and the noise-corrupted speech was calculated. Results: In the experiments, four types of noises were included train, factory, car and air conditioner. Each type of speaking style included 10 samples of 10 utterances of male and female speech. Five levels of noises were varied from 0-20 dB compared with the clean speech. It could be notified that the effects of distinguishing types of noises were different. Four different types of speaking styles were also caused the differences in RMSEs. Conclusion: The recorded noises deteriorate the F0 contours for all types of speaking styles in Thai.

Highlights

  • In human speech production, fundamental frequency or F0 is a very crucial feature known to carry frequency contour

  • This study concentrates on expressive speech of angry, sadness, enjoy and reading styles, the selected four types of noises are air-conditioner, car, factory and train noises

  • This study proposes an analysis the differences between the fundamental frequency of clean expressive speech and noise-corrupted expressive speech in term of Root Mean Square Error (RMSE)

Read more

Summary

INTRODUCTION

Fundamental frequency or F0 is a very crucial feature known to carry frequency contour. In the recent study on modeling of F0 contour with noisy environment, the simulated noises deteriorate the Fujisaki’s model parameters (Fujisaki and Sudo, 1971; Mixdorff and Fujisaki, 1997; Seresangtakul and Takara, 2003). The study on the direct effect of noises on the fundamental frequency contour of the expressive speech has not been conducted. This study proposes an analysis the differences between the fundamental frequency of clean expressive speech and noise-corrupted expressive speech in term of RMSE. Corresponding Author: Suphattharachai Chomphan, Department of Electrical Engineering, Faculty of Engineering at Si Racha, Kasetsart University, 199 M.6, Tungsukhla, Si Racha, Chonburi, 20230 Thailand

MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call