Text‐to‐speech system for English and Japanese

Kenji Matsui,Hector Javkin,Masaaki Kitano,Kazue Hate,Noriyo Hara,Hisashi Wakita

doi:10.1121/1.2026233

Abstract

A real‐time text‐to‐speech system for English and Japanese has been developed. This system consists of a language processing module, a phonetic acoustic processing module, and a synthesis module. Full general English and Japanese sentences can be converted to speech. The Japanese software and English software are independent except for the synthesis module. The features of this system are as follows. (1) The synthesis module is a phoneme‐based cascade‐parallel formant synthesizer with high observed intelligibility (73.5% for the 119 Japanese monosyllables). (2) This system has a 3000‐morphene English dictionary and 40 000‐word Japanese dictionary with a high‐speed search algorithm. (3) A large speech database was collected for the development of Japanese prosody rules. (4) For the precise control of pitch contour, the Fujisaki model was adopted. (5) One of the two systems developed can stand alone; the other requires a personal computer with a high‐speed DSP board. (6) In the development of this system, some powerful interactive tools have also been developed for varying speech parameters in real time.

Full Text