Abstract

A fundamental challenge in speech perception involves the resolution of a many-to-one mapping from a highly variable, continuous sensory signal onto discrete, perceptually stable categories that bear functional relevance. Recent work has identified signatures of invariance in early neural responses to speech, but the physiological mechanisms that give rise to these categorical representations remain unclear. We employed intracranial recordings in human subjects listening to and categorizing speech stimuli to investigate the spatiotemporal cortical dynamics underlying categorical perception. Stimuli comprised a voice-onset time (VOT) continuum from /b/ (0 ms VOT) to /p/ (50 ms VOT). Results revealed spatially distinct neural populations that respond selectively to tokens from one category (either /b/ or /p/). Within these subpopulations, response amplitude is modulated by stimulus prototypicality for within-category stimuli (e.g., stronger response to 0 ms vs. 10 ms VOT in /b/-selective electrodes). Over the course of a trial, this initially graded encoding of VOT rapidly evolves to reflect properties of the ultimate (categorical) behavioral response function. These same dynamics emerged in a computational neural network model simulating neuronal populations as leaky integrators tuned to detect temporally distributed acoustic cues at precise lags. Our results provide direct evidence that categorical perception of VOT arises dynamically within discrete, phonetically tuned neural populations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call