Abstract

Mosaic speech is degraded speech that is segmented into time × frequency blocks. Earlier research with Japanese mosaic speech has shown that its intelligibility is almost perfect for mosaic block durations (MBD) up to 40 ms. The purpose of the present study was to investigate the intelligibility of English mosaic speech, and whether its intelligibility would vary if it was compressed in time, preserved, or stretched in time. Furthermore, we investigated whether intelligibility differed between native and non-native speakers of English. English (n = 19), Indonesian (n = 19), and Chinese (n = 20) listeners participated in an experiment, in which the mosaic speech stimuli were presented, and they had to type what they had heard. The results showed that compressing or stretching the English mosaic speech resulted in similar trends in intelligibility among the three language groups, with some exceptions. Generally, the intelligibility for MBDs of 20 and 40 ms after preserving/stretching was higher, and decreased beyond MBDs of 80 ms after stretching. Compression also lowered intelligibility. This suggests that humans can extract new information from individual speech segments of about 40 ms, but that there is a limit to the amount of linguistic information that can be conveyed within a block of about 40 ms or below.

Highlights

  • In daily life, we often need to interpret speech that is interrupted or accompanied by other sounds.Various studies have been performed to investigate how humans are able to interpret speech when it is spoken in a noisy environment [1,2], or under reverberation [2,3]

  • A wealth of research has been performed on temporal aspects of speech processing, by using speech in which parts of the signal were segmented or omitted

  • For the Chinese group, for both original mosaic block duration (OMBD), there were no significant differences in intelligibility between the preserved and the compressed or the stretched

Read more

Summary

Introduction

Various studies have been performed to investigate how humans are able to interpret speech when it is spoken in a noisy environment [1,2], or under reverberation [2,3]. Despite the silent gaps, listeners could still extract some meaning from the signals. Further studies on such “gated speech” showed that even if the 50-ms silent gaps were removed and the remaining speech portions were contracted, the speech could still be intelligible [5,6]. Besides periodically interrupted speech, processing of distorted speech has further been investigated with speech that was temporally smeared [7,8] or temporally reversed [9,10]. Of particular interest is the perception of locally time-reversed speech.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call