Abstract

This paper describes perceptual methods for diagnosing problems in text-to-speech systems. Special attention is paid to two issues. First, coverage of the domain of a text-to-speech system. Since this domain involves an enormous range of contexts, it is criticial for diagnostics, and also for overall evaluation, that test materials cover this range to the fullest extent possible. Automatic text generation algorithms that make extensive use of "greedy" algorithms are described that serve this purpose. Second, speech generated by text-to-speech systems tends to have a great variety of problems . A battery of experimental paradigms is discussed that address different facets of speech quality and intelligibility. Included are: (a) "word pointing" method for detection of problematic concatenative units, (b) "minimal pairs intelligibility test"—an expanded diagnostic rhyme test; (c) automatically scored orthographic name transcription task; (d) mean opinion score paradigm with problem categorization; and (e) paired comparison paradigm with strength-of-choice rating. The methods are applied in a series of experiments on high-end text-to-speech systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call