Effects of Noises on Fujisakiâs Model of Fundamental Frequency Contours for Thai Dialects

Suphattharachai Chomphan ,Chutarat Chompunth

doi:10.3844/ajassp.2012.1684.1693

Abstract

Problem statement: Modeling of fundamental frequency (F0) contour plays an important role on the natural speech processing, since F0 is an important speech feature defining the human speech prosody. In Thai, there are four main dialects spoken by Thai people residing in four core region including central, north, northeast and south regions. Environmental noises are also plays an important role in corrupting the speech quality. The study of effects of noises on modeling of F0 contour for Thai dialects will evaluate robustness of the modeling techniques. Approach: The Fujisakis model has been selected in this study because of its achievement in modeling of various Thai speech units. Four types of environmental noises are simulated with different levels of power. The differences among the model parameters of four Thai dialects have been summarized. This study proposes an analysis of model parameters for Thai speech prosody with four regional dialects and two genders and four types of noises. Seven derived parameters from the Fujisakis model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each regional dialect includes 10 samples of 10 sentences with male and female speech. Four types of noises include train, factory, car and air conditioner. Moreover, five levels of each type of noise are varied from 0-20 dB. The results show that most of the proposed parameters can distinguish four kinds of regional dialects explicitly. Conclusion: By using the Fujisakis model, the results confirm that the proposed parameters can distinguish the regional dialects efficiently. However, the simulated noises deteriorate the F0 contours and also distort the model parameters.

Highlights

The modeling of F0 contour with noisy environment causes the degradation of naturalness of the speech
All of these derived parameters have been extracted for four regional Thai dialects including standard Thai, Lanna or North dialect, Lao-style or North East dialect and South dialect
The sentences have been recorded in four Thai dialects of standard Thai (Center-dialect), Lanna Thai dialect (North-dialect), Lao-style Thai dialect (Northeast-dialect) and South Thai dialect (South-dialect)

Summary

Introduction

The modeling of F0 contour with noisy environment causes the degradation of naturalness of the speech. To develop the natural speech processing system, it is necessary to know how the noise deteriorates the model parameters. Fujisaki’s modeling of fundamental frequency for Thai expressive speech conducted in 2010 is proved to be effective for a limited-domain speech corpus speech from each other. Fujisaki’s Modeling of F0 contours for Thai. Dialects has been conducted by (Chomphan 2010a; 2010b). By using the same way of Thai dialects without considering of various types of noises (Chomphan, 2010b), the study proposes an analysis of F0 modeling of (Chomphan, 2010a). The extension of Fujisaki’s model which is a preliminary work for the advanced research in speech synthesis and recognition is mainly selected in this study

Methods

Results

Discussion

Conclusion