Abstract

Formal models of human phonetic perception are often formulated at a highly abstracted, computational level of description. One cost of such an approach is that it can be exceedingly difficult to translate a “high-level” computational theory into the “low-level” neural circuitry which implicitly underpins any theory of human perception. Here we present our initial efforts to formulate a theory of phonetic perception in terms of known neurological primitives—in this case a particular mathematical characterization of the stimulus/response characteristics of neurons in mammalian auditory cortex: the spectro-temporal receptive field. We propose that phonetic categories can be modeled as ensembles of these cells, and use computer simulations to demonstrate that such an approach exhibits the psycho-acoustical warping characteristic of categorical perception. This model has no acoustic “features” as traditionally construed: no formants, no spectral moments, no MFCCs, etc. The primitive objects of this model are simply the spectrogram-like time-frequency representations generated by the inner ear. Thus, this model is formulated entirely in terms of the peripheral and central primitives of the ascending auditory pathway. The approach also has appealing computational properties, and can be demonstrated to be a computationally tractable, efficient coding transform of speech acoustics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call