Developing a Child Friendly Text-to-Speech System

Agnes Jacob,P Mythili

doi:10.1155/2008/597971

Abstract

This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).

Highlights

There are various critical factors to be considered while designing a TTS system that will produce intelligible speech
The implementation details of a child friendly phoneme-based concatenative TTS, with sufficient degree of customization and which uses linguistic analysis to circumvent most of the problems of existing concatenative systems, have been presented
Voice conversion feature has been incorporated in this TTS using the linear prediction coding (LPC) method with provision for varying the voice quality over a wide range by varying the F0 values in the synthesis stage

Summary

Introduction

There are various critical factors to be considered while designing a TTS system that will produce intelligible speech. The first crucial step in the design of any concatenative TTS system is to select the most appropriate units or segments that result in smooth concatenation. This involves a tradeoff between longer and shorter units. In addition to being part of computationally manageable inventory of items, the synthesis segments chosen should capture all the transient and transitional information. The latter had been emphasized throughout this work, which in turn contributed to the smooth concatenation of speech segments in this TTS

Methods

Results

Conclusion