Auditory object salience: human cortical processing of non-biological action sounds and their acoustic signal attributes

James W. Lewis,Chris A. Frum,William J. Talkington,Katherine C. Tallaksen

doi:10.3389/fnsys.2012.00027

James W. Lewis, Chris A. Frum + Show 2 more

Open Access

https://doi.org/10.3389/fnsys.2012.00027

Copy DOI

Journal: Frontiers in System Neuroscience	Publication Date: Jan 1, 2012
Citations: 77	License type: cc-by

Affiliation: West Virginia University

Abstract

Whether viewed or heard, an object in action can be segmented as a distinct salient event based on a number of different sensory cues. In the visual system, several low-level attributes of an image are processed along parallel hierarchies, involving intermediate stages wherein gross-level object form and/or motion features are extracted prior to stages that show greater specificity for different object categories (e.g., people, buildings, or tools). In the auditory system, though relying on a rather different set of low-level signal attributes, meaningful real-world acoustic events and “auditory objects” can also be readily distinguished from background scenes. However, the nature of the acoustic signal attributes or gross-level perceptual features that may be explicitly processed along intermediate cortical processing stages remain poorly understood. Examining mechanical and environmental action sounds, representing two distinct non-biological categories of action sources, we had participants assess the degree to which each sound was perceived as object-like versus scene-like. We re-analyzed data from two of our earlier functional magnetic resonance imaging (fMRI) task paradigms (Engel et al., 2009) and found that scene-like action sounds preferentially led to activation along several midline cortical structures, but with strong dependence on listening task demands. In contrast, bilateral foci along the superior temporal gyri (STG) showed parametrically increasing activation to action sounds rated as more “object-like,” independent of sound category or task demands. Moreover, these STG regions also showed parametric sensitivity to spectral structure variations (SSVs) of the action sounds—a quantitative measure of change in entropy of the acoustic signals over time—and the right STG additionally showed parametric sensitivity to measures of mean entropy and harmonic content of the environmental sounds. Analogous to the visual system, intermediate stages of the auditory system appear to process or extract a number of quantifiable low-order signal attributes that are characteristic of action events perceived as being object-like, representing stages that may begin to dissociate different perceptual dimensions and categories of every-day, real-world action sounds.

Highlights

For sensory systems, feature extraction models (Laaksonen et al, 2004) represent potential neuronal mechanisms that may develop to efficiently segment and distinguish objects or events based on salient features and components within a scene
In our earlier studies examining these same data we reported that the medial two-thirds of Heschl’s gyrus (HG), the approximate location of primary auditory cortices (PACs), were strongly activated by both the mechanical and environmental sound stimuli; there was no differential activation to these different conceptual categories of sound in these regions (Engel et al, 2009; Lewis et al, 2011)
We examined cortical responses to the same mechanical and environmental sound stimuli but “re-grouped” them according to their perceptual ratings along a putative continuum of object-like to scene-like; psychophysical ratings of the mechanical and environmental sounds were derived from non-imaging listeners (n = 18) who rated the sounds on a Likert scale (Figure 1A; range 1 = object-like to 5 = scene-like; refer to Methods)

Summary

Introduction

Feature extraction models (Laaksonen et al, 2004) represent potential neuronal mechanisms that may develop to efficiently segment and distinguish objects or events based on salient features and components within a scene. The pSTS and LOC regions appear to house hierarchically intermediate processing stages or channels for analyzing gross-level visual objects or object-like features by assimilating inputs from earlier areas that represent a variety of low-level visual attributes. This hierarchical processing may contribute to the segmentation of a distinct object, or objects, present within a complex visual scene (Felleman and van Essen, 1991; Macevoy and Epstein, 2011)

Methods

Results

Conclusion