Abstract

A method of high-speed reading in a text-to-speech conversion system including a text analysis module ( 101 ) for generating a phoneme and prosody character string from an input text; a prosody generation module ( 102 ) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module ( 103 ) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary ( 105 ). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call