Our visual system enables us to effortlessly navigate and recognize real-world visual environments. Functional magnetic resonance imaging (fMRI) studies suggest a network of scene-responsive cortical visual areas, but much less is known about the temporal order in which different scene properties are analysed by the human visual system. In this study, we selected a set of 36 full-colour natural scenes that varied in spatial structure and semantic content that our male and female human participants viewed both in 2D and 3D while we recorded magnetoencephalography (MEG) data. MEG enables tracking of cortical activity in humans at millisecond timescale. We compared the representational geometry in the MEG responses with predictions based on the scene stimuli using the representational similarity analysis framework. The representational structure first reflected the spatial structure in the scenes in time-window 90-125 ms, followed by the semantic content in time-window 140-175 ms after stimulus onset. The 3D stereoscopic viewing of the scenes affected the responses relatively late, from around 140 ms from stimulus onset. Taken together, our results indicate that the human visual system rapidly encodes a scene's spatial structure and suggest that this information is based on monocular instead of binocular depth cues.Significance statement Our visual system enables us to recognize and navigate our visual surroundings seemingly effortlessly, but what exactly happens in our brains remains poorly understood. With the help of time-resolved brain imaging (magnetoencephalography, MEG), we found that the brain first encodes the spatial structure of a scene (e.g., cluttered or navigable) before its semantic content (e.g., a car park or farm). Brain imaging studies typically use 2D pictures as stimuli. Here we asked whether binocular disparity, a depth cue which arises from our two eyes seeing the scene from slightly different angles, aids the coding of the spatial structure. Our results suggest that this 3D depth cue plays little role in the rapid, initial sensing of our spatial surroundings.
Read full abstract