Abstract
The fully automatic generation of a prosodic structure for a text-to-speech system has remained an elusive task. Most systems still depend on extensive user-specified timing and F0 markers to improve the approximation of a natural prosody. A three-stage processing system that provides fully automatic prosodic control for a wide variety of French declarative sentences has been developed. Perceptual testing has demonstrated a high degree of acceptability, as well as an excellent resistance to degradation induced by telephone transmission. The initial phonological processing phase, comprising phrase identification, liaison, chaining, and syllabification rules, produces a phonemic chain marked for prosodic, lexical, and syllable boundaries. The second processing phase calculates the temporal structure on the basis of a statistical model incorporating some 25 segmental and suprasegmental parameters. The final phase specifies accented and nonaccented syllables and applies Fujisaki modeling for the generation of the F0 contour. The system has been integrated into a TTS system that functions in real time on high-end personal computers. Examples are available on the laboratory’s ftp site (ftp://eliot.unil.ch/pub/LAIP/speech/LAIPTTS). This report will concentrate on the derivation of the relevant control parameters. [Work supported by FNRS, OFES.]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.