Perception of global properties, objects, and settings in natural auditory scenes

Margaret A Mcmullin,Nathan C Higgins,Mounya Elhilali,Brian Gygi,Joel S Snyder,Rohit Kumar

doi:10.1121/10.0019028

Abstract

Theories of auditory scene analysis suggest our perception of scenes relies on identifying and segregating objects within it. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. In our first experiment, we studied perception of eight global properties (e.g., openness), using a collection of 200 high-quality auditory scenes. Participants showed high agreement on their ratings of global properties. The global properties were explained by a two-factor model. Acoustic features of scenes were explained by a seven-factor model, and linearly predicted the global ratings by different amounts (R-squared = 0.33–0.87), although we also observed nonlinear relationships between acoustical and global variables. A multi-layer neural network trained to recognize auditory objects in everyday soundscapes from YouTube shows high-level embeddings of our 200 scenes are correlated with some global variables at earlier stages of processing than others. In a second experiment, we evaluated participants’ accuracy in identifying the setting of and objects within scenes across three durations (1, 2, and 4 s). Overall, participants performed better on the object identification task, but needed longer duration stimuli to do so. These results suggest object identification may require more processing time and/or attention switching than setting identification.

Full Text