Abstract

The study investigated the segmental intelligibility of four text-to-speech (TTS) products under 0 dB and 5 dB signal-to-noise ratios in a group of native and nonnative speakers of English. Each product—AT&T Next-Gen™, Festival version 1.4.2, FlexVoice™ 2, and IBM ViaVoice™ Version 5.1—uses a different algorithm for generating speech from text. The results, which benefit developers of TTS technology as well as developers of products that utilize TTS, showed that (1) all TTS products were less intelligible to nonnative speakers of English than native speakers, (2) the “hybrid” TTS product that combined concatenative and formant synthesis methods was the least intelligible of the four products investigated, (3) the remaining three products, which used formant, concatenative diphone based LPC, and concatenative waveform synthesis methods respectively, were equally intelligible to nonnative speakers, (4) none of the four TTS products was better at resisting intelligibility loss due to noise than others, and (5) listening to currently available unrestricted TTS under high noise conditions would probably require a greater amount of cognitive resources on the part of both native and nonnative speakers of English and may be difficult when other demanding activities are concurrently performed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.