Auditory scene context facilitates visual recognition of objects in consistent visual scenes

Takahiro Saiki,Ryosuke Niimi,Kazuhiko Yokosawa

doi:10.3758/s13414-023-02699-0

Abstract

Visual object recognition is facilitated by contextually consistent scenes in which the object is embedded. Scene gist representations extracted from the scenery backgrounds yield this scene consistency effect. Here we examined whether the scene consistency effect is specific to the visual domain or if it is crossmodal. Through four experiments, the accuracy of the naming of briefly presented visual objects was assessed. In each trial, a 4-s sound clip was presented and a visual scene containing the target object was briefly shown at the end of the sound clip. In a consistent sound condition, an environmental sound associated with the scene in which the target object typically appears was presented (e.g., forest noise for a bear target object). In an inconsistent sound condition, a sound clip contextually inconsistent with the target object was presented (e.g., city noise for a bear). In a control sound condition, a nonsensical sound (sawtooth wave) was presented. When target objects were embedded in contextually consistent visual scenes (Experiment 1: a bear in a forest background), consistent sounds increased object-naming accuracy. In contrast, sound conditions did not show a significant effect when target objects were embedded in contextually inconsistent visual scenes (Experiment 2: a bear in a pedestrian crossing background) or in a blank background (Experiments 3and 4). These results suggested that auditory scene context has weak or no direct influence on visual object recognition. It seems likely that consistent auditory scenes indirectly facilitate visual object recognition by promoting visual scene processing.

Full Text