Method of controlling high-speed reading in a text-to-speech conversion system

Keiichi Chihara

doi:10.1121/1.3359250

Abstract

A method of high-speed reading in a text-to-speech conversion system including a text analysis module ( 101 ) for generating a phoneme and prosody character string from an input text; a prosody generation module ( 102 ) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module ( 103 ) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary ( 105 ). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.

Full Text