The role of form in modeling auditory scene analysis

Susan Denham,Martin Coath

doi:10.1121/1.4920203

Abstract

Separating out and correctly grouping the sounds of a communicating animal from the natural acoustic environment poses significant challenges to models of auditory scene analysis, yet animals perform this task very effectively. To date, most models have focussed on simultaneous grouping cues and the segregation of discrete sound events, although some take the longer term context into account. Inspired by the important part that form plays in the segregation and recognition of visual objects, we consider the role of form in auditory scene analysis. By form in audition we mean the dynamic spectrotemporal patterns characterizing individual sound events as well as their timing with respect to each other. We present a model capable of segregating and recognizing natural communication calls within complex acoustic environments. Incoming sounds are processed using a model of the auditory periphery and fed into a recurrent neural network that rapidly tunes itself to respond preferentially to specific events. Representations of predictable patterns of events in the sequence are created on the fly and maintained on the basis of their predictive success and conflict with other representations. Activation levels of these representations are interpreted in terms of object recognition.

Full Text