Abstract

In contrast to knowledge-based techniques such as articulatory or rule-based formant synthesis, data-based approaches to speech synthesis derive synthetic voices from a corpus of recorded human speech. Current “unit selection” data-based systems can produce synthetic utterances that rival human speech in intelligibility and naturalness by concatenating units of various sizes derived from large searchable corpora of speech aligned with tags that capture a rich set of segmental and prosodic features assigned to phonemic or subphonemic units. Emerging data-based techniques use machine learning algorithms to derive parameters automatically for parametric speech synthesis. Whatever the approach, because the extensive corpus of speech data used to construct most modern data-based systems is obtained from an individual talker, the synthetic output of the system captures much of the vocal identity of that individual. This latter feature has opened up new areas of application, such as personalized synthetic voices and “voice banking” for patients who might lose the ability to speak. This talk will trace the development of current data-based approaches, describe applications of this synthesis technology, and discuss some of the major challenges driving current research efforts. [Some work supported by NIDCD Grant R42-DC006193.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.