Assessing the similarity of cortical object and scene representations through cross-validated voxel encoding models

Nicholas M Blauch,Alireza Chaman Zar,David Plaut,Juhi Farooqui,Marlene Behrmann,Filipe De Avila Belbute Peres

doi:10.1167/19.10.188d

Nicholas M Blauch, Alireza Chaman Zar + Show 4 more

Open Access

https://doi.org/10.1167/19.10.188d

Copy DOI

Abstract

Object and scene perception are instantiated in overlapping networks of cortical regions, including three scene-selective areas in parahippocampal, occipital, and medial parietal cortex (PPA, OPA, and MPA), and a lateral occipital cortical area (LOC) selective for intact objects. The exact contributions of these regions to object and scene perception remain unknown. Here, we leverage BOLD5000 (Chang et. al, 2018), a public fMRI dataset containing responses to ~5000 images in ImageNet, COCO, and Scenes databases, to better understand the roles of these regions in visual perception. These databases vary in the degree to which images focus on single objects, a few objects, or whole scenes, respectively. We build voxel encoding models based on features from a deep convolutional neural network (DCNN) and assess the generalization of our encoding models trained and tested on all combinations of ImageNet, COCO, and Scenes databases. As predicted, we find good generalization between models trained and tested on ImageNet and COCO and poor generalization between ImageNet/COCO trained models and Scenes for most DCNN layer/ROI encoding models. Surprisingly, we find generalization from ImageNet/COCO to Scenes only in early visual cortex with encoding models of intermediate DCNN layers. Additionally, LOC and PPA exhibit similarly good generalization between ImageNet and COCO and poor generalization to Scenes. Excluding MPA responses to Scenes, all scene-selective areas generalize well to held-out data in the trained image database, but PPA exhibits the most robust generalization out-of-database between ImageNet and COCO, reflecting a more general perceptual role. Our work reflects a novel application of encoding models in neuroscience in which distinct stimulus sets are used for training and testing in order to test the similarity of representations underlying these stimuli. We plan to further test the effect of pretraining the DCNN on Places365 rather than ImageNet, and to look at image-level predictors of generalization.

Full Text