Analyzing objects through time

Barbara Shinn-Cunningham

doi:10.1121/1.4708212

Abstract

Most researchers accept that the act of preparing to listen for a source with a particular attribute (e.g., from a particular location or a particular speaker) causes preparatory changes in how subsequent sound inputs are processed, and thus how an auditory scene is analyzed. However, the dynamics of how humans parse an auditory scene are complex and depend upon not only this kind of volitional attentional modulation but also automatic processes. Moreover, perceptual automatic processes depend both on the recent statistical properties of input sound (e.g., regularities that determine whether an input is unexpected / novel versus predictable) and, as at least based on results from our own lab, on what perceptual object was the focus of attention in the preceding moments. For instance, we find that once a given stream of sound is the focus of attention, subsequent sound elements that are perceptually similar are perceptually enhanced in an obligatory process, even in the absence of volitional attentional focus. Similarly, changes in sound attributes that cause perceptual discontinuities disrupt processing of auditory inputs. These factors, which strongly impact human processing of auditory scenes, will be discussed and contrasted with the processing governing many machine algorithms for auditory scene analysis.

Full Text