Abstract

A technique has been developed for Japanese speech synthesis‐by‐rule to control the rhythm of synthetic speech sounds to which little attention has been given so far. In Japanese speech sounds, syllables are generally believed to be the basic elements of the rhythm, with each syllable sound pronounced almost isochronously. It was found through listening tests that there is an important portion in a syllable for recognizing the syllable and the positioning of that portion determines the rhythm. The portion was termed auditory perceptual timing point (APTP) and was determined for each syllable in listening tests. Most APTPs were found near the voice onset, which closely agreed with the result obtained by Sato [H. Sato, Trans. Comm. Speech Res., ASJ, S77‐31, 1–8 (1977)]. The rhythm pattern was, in principle, determined by the number of morae in individual words and the syntactic structure of an input text, though further investigation is necessary to construct detailed rules. It has been confirmed that the quality of synthetic speech sound can be improved by employing this rhythm‐control technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call