Fujisakiâs Model of Thaiâs Fundamental Frequency Contours with Environmental Noises

Edgar Edgar

doi:10.3844/ajassp.2012.1251.1258

Abstract

Problem statement: An important human speech feature is the fundamental frequency (F0) contour which represents the speech prosody. It indicates the naturalness and intelligibility of the speech. Modeling of fundamental frequency contour was an essential procedure in the natural speech processing. In speech communication, environmental noise plays an essential role in damaging the digital communication quality. The study of effects of noises on modeling of F0 contour for standard Thai is conducted. Approach: The selected modeling technique in this study was adapted from Fujisaki’s model, because of its achievement in modeling of various Thai speech units. Four types of environmental noises were recorded for different levels of power. This study was proposed an analysis of some parameters of modeling of Thai speech prosody for two genders and four types of noises. The derived Fujisaki’s model was covered seven parameters including baseline frequency, the numbers of phrase commands and tone commands, phrase command and tone command durations, amplitudes of phrase command and tone command. Results: In the experimental results, the standard Thai of 2 samples of 5 sentences with 5 males and 5 females was used. Four types of noises include train, factory, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The results were showing that the different noises give the distinguished effects for most of the proposed model parameters. Conclusion: The results confirm that the effects of four types of noises are significantly different. It can be seen that the environmental noises deteriorate the model parameters empirically.

Highlights

In the former study, modeling of F0 contour with noisy environment causes the deterioration of naturalness of the speech
In Thai speech, this model has been effectively applied for applying to the utterances, words and tones (Seresangtakul and Takara, 2002; 2003; Hiroya and Sumio, 2002)
It has been seen that the selected model parameters are able to distinguish all styles of expressive speech

Summary

INTRODUCTION

In the former study, modeling of F0 contour with noisy environment causes the deterioration of naturalness of the speech. To develop the modern natural speech processing system, it is very important to know how the noise degrades the model parameters. In Thai speech, this model has been effectively applied for applying to the utterances, words and tones (Seresangtakul and Takara, 2002; 2003; Hiroya and Sumio, 2002). The modeling of fundamental frequency for Thai expressive speech with a limited-domain speech database was succesfully conducted in 2010 (Chomphan, 2010a). Fujisaki’s Modeling of F0 contours for Thai Dialects has been conducted by Chomphan (2010b). This study applies the same way of the former study by using an analysis of F0 contour modeling of standard Thai with four different types of noises.

MATERIALS AND METHODS

CONCLUSION