Perceptual normalization for speaking rate occurs below the level of the syllable.

Margaret Cychosz,Rochelle S Newman

doi:10.1121/10.0017360

Abstract

Because speaking rates are highly variable, listeners must use cues like phoneme or sentence duration to normalize speech across different contexts. Scaling speech perception in this way allows listeners to distinguish between temporal contrasts, like voiced and voiceless stops, even at different speech speeds. It has long been assumed that this speaking rate normalization can occur over small units such as phonemes. However, phonemes lack clear boundaries in running speech, so it is not clear that listeners can rely on them for normalization. To evaluate this, we isolate two potential processing levels for speaking rate normalization-syllabic and sub-syllabic-by manipulating phoneme duration in order to cue speaking rate, while also holding syllable duration constant. In doing so, we show that changing the duration of phonemes both with unique spectro-temporal signatures (/kɑ/) and more overlapping spectro-temporal signatures (/wɪ/) results in a speaking rate normalization effect. These results suggest that when acoustic boundaries within syllables are less clear, listeners can normalize for rate differences on the basis of sub-syllabic units.

Full Text