Abstract
The new AT&T TTS system for general U.S. English text is based on best-choice components picked from the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR’s CHATR system. From Flextalk, it employs text normalization, letter-to-sound, and (optionally) baseline prosody generation. Festival provides general software-engineering infrastructure (modularity) for easy experimentation and competitive evaluation of different algorithms or modules. Finally, CHATR’s unit selection was modified to guarantee the intelligibility of a good n-phone (n=2 would be diphone) synthesizer while improving significantly on perceived naturalness relative to Flextalk. Each decision made during the research and development phase of this system was based on formal subjective evaluations. For example, the best voice found in a test that compared TTS systems built from several speakers gave a 0.3-point head start (on a 5-point rating scale) in quality over the mean of all speakers. Similarly, using our Harmonic-plus-Noise speech representation gave us a 0.25-point advantage over standard TD-PSOLA. Finally, not performing prosodic modifications (other than some smoothing across concatenation points) on the units but using the system-generated prosody as a target in unit selection, 0.4 points were gained on overall quality. In conclusion, the new system combines the best of rule-based and data-driven worlds in TTS technology to deliver on the long-standing promise of truly natural-sounding synthesis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.