Abstract

In contrast to knowledge-based techniques such as articulatory or rule-based formant synthesis, data-based approaches to speech synthesis derive synthetic voices from a corpus of recorded human speech. Current “unit selection” data-based systems can produce synthetic utterances that rival human speech in intelligibility and naturalness by concatenating units of various sizes derived from large searchable corpora of speech aligned with tags that capture a rich set of segmental and prosodic features assigned to phonemic or subphonemic units. Emerging data-based techniques use machine learning algorithms to derive parameters automatically for parametric speech synthesis. Whatever the approach, because the extensive corpus of speech data used to construct most modern data-based systems is obtained from an individual talker, the synthetic output of the system captures much of the vocal identity of that individual. This latter feature has opened up new areas of application, such as personalized synthetic voices and “voice banking” for patients who might lose the ability to speak. This talk will trace the development of current data-based approaches, describe applications of this synthesis technology, and discuss some of the major challenges driving current research efforts. [Some work supported by NIDCD Grant R42-DC006193.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call