Abstract
A classic problem in spoken language comprehension is how speech is perceived as comprised of discrete words. A Syllable Inference account of spoken word recognition and segmentation is proposed, according to which alternative hierarchical models of syllables, words, and phonemes are dynamically posited. Estimates of context speech rate are combined with generative models, such that over time, models which result in local minima in error between predicted and recently experienced signals give rise to perceptions of hearing words. Evidence for this account comes from experiments using the visual world eye-tracking paradigm. Materials were sentences that were acoustically ambiguous in numbers of syllables, words, and phonemes they contained. Time-compressing, or expanding, speech materials permitted determination of how temporal information at, or in the context of, each locus affected looks to, and selection of, pictures with a singular or plural referent. Supporting our account, listeners probabilistically interpreted identical chunks of speech as consistent with a singular or plural referent to a degree that was based on the chunk’s gradient rate in relation to its context. These results support the Syllable Inference account that arriving temporal information informs inferences about syllables, giving rise to perception of words separated in time.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have