Abstract

Previous research comparing the perception of natural and synthetic speech has demonstrated that the intelligibility of synthetic speech is significantly worse than natural speech. The present experiment was designed to investigate the effects of speech rate, pitch contour, and sentence meaning on the perception of fluent synthetic speech. The stimuli were sentences that were either syntactically correct and meaningful or syntactically correct and semantically anomalous. The sentences were generated by a Telesensory Prose‐2000 text‐to‐speech system. The system produced the sentences at either 150 words per minute or 250 words per minute. Half of the sentences were generated with a flat pitch (monotone) and half were generated with “normal” clausal intonation. Subjects were instructed to identify the sentences by writing down what they heard. The percentage of words correctly identified was determined for each of the experimental conditions. In addition, the distribution of identification errors in each condition was examined. The results indicate that intelligibility of synthetic speech is primarily influenced by the speech rate and the meaningfulness of the sentences. These results have important implications for the design and application of text‐to‐speech systems, as well as theories of fluent speech perception. [Work supported by NIH.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call