Abstract

Temporal resolution needed for Japanese speech communication was measured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for healthy listeners to perceive speech is introduced. As a first step, we report listeners' intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called “mosaic speech”: speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of ~40 ms into vivid motion.

Highlights

  • Speech can remain considerably intelligible even when it is drastically manipulated in the temporal domain

  • When speech is cut into segments of 50 ms, for example, and if each segment is reversed in time, intelligibility is still quite well preserved (Steffen and Werani, 1994; Saberi and Perrott, 1999; Ueda et al, 2017)

  • Neuroscientific research performed in the last decade has suggested that different types of neural oscillations are involved in the segmentation and organization of speech into perceptual units (Giraud and Poeppel, 2012; Chait et al, 2015)

Read more

Summary

Introduction

Speech can remain considerably intelligible even when it is drastically manipulated in the temporal domain. The perception of locally time-reversed speech is to be noted. When speech is cut into segments of 50 ms, for example, and if each segment is reversed in time, intelligibility is still quite well preserved (Steffen and Werani, 1994; Saberi and Perrott, 1999; Ueda et al, 2017). Neural oscillations at modulation frequencies around 30–50 Hz, corresponding to a temporal resolution around 20–33 ms, are considered to be involved in phonemic processing. If this temporal resolution is attained, the human auditory system should receive sufficient information to grasp the rhythmic intensity fluctuations in speech and music (Ding et al, 2017)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.