Abstract

There has been interest for many years in diphone-based speech synthesis and, recently, a rapidly increasing interest in unit selection-based synthesis (as illustrated by interest in the CHATR system). The limits of both systems are well known. While intelligibility is generally very high for diphone-based systems, the resulting signals do not sound completely natural. This happens for several reasons, amongst them the limited number of phone variants present in a typical system, and the cost of concatenating at diphone boundaries. For unit selection synthesis, typically phone-based, it is possible to produce sentences that sound surprisingly natural and intelligible from a large database. However, quality is often not consistent, and the main difficulties appear to be related to selecting acoustically appropriate units from a large database with the correct prosodic characteristics. Typically no prosody modification is done. In an effort to capture the best features of both systems a unit-selection and synthesis algorithm has been devised that allows finer control than the CHATR system (version 0.8), both by applying selective prosody modification and by exercising finer control over the units that get chosen for synthesis. Results of experiments based on this version of unit selection synthesis will be presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call