Abstract

In order to explain the different performance obtained with natural and synthetic speech at different linguistic levels over the telephone line, we analyzed the data collected in an experiment where 108 randomized stimuli were presented to 96 subjects. Subjects were required to identify the consonant in 51 CV and 57 VCV meaningful or meaningless words. There were 20 different listening conditions: 6 TTS systems (3 formant-based (SF) and 3 diphone-based (SD)), a pure natural voice (NV) and 3 signal-to-noise (S/N) ratios (6, 0, and -6 dB) for a total of 10 systems, presented both in good and in telephone conditions. The comparison between consonant confusions for natural and synthetic speech with comparable overall levels of intelligibility performance showed that the distributions of the consonant confusions for natural and synthetic speech were often quite different in each condition. Some analyses of different spectrograms suggests that such confusions are due to some problems in the phonetic rules and to the telephone line.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call