What can 5.17 billion regression fits tell us about the representational format of the high-level human visual system?

Talia Konkle,Colin Conwell,Jacob S Prince,George A Alvarez

doi:10.1167/jov.22.14.4422

Talia Konkle, Colin Conwell + Show 2 more

Open Access

https://doi.org/10.1167/jov.22.14.4422

Copy DOI

Journal: Journal of Vision	Publication Date: Dec 5, 2022
Citations: 1	License type: CC BY-NC-ND 4.0

Affiliation: Harvard University

Abstract

Deep neural network models are often taken to be direct models of the hierarchical visual system; under this framework, benchmarking efforts like BrainScore (Schrimpf et al., 2018) often seek a single model with the overall best brain predictivity. However, these models also provide unprecedented experimental opportunities to systematically explore visual representation formation, by manipulating either the visual diet, the architectural inductive biases, or the format pressures induced by the task, while holding other factors constant. Here we consider targeted comparisons from >110 models and leverage the most extensive fMRI data set collected on visual system responses to date (NSD) to explore which factors give rise to more or less brain-like representational formats. The factor which showed the biggest variation in visual system brain predictivity was the task. Holding both architecture and input constant, object categorization creates a more brain-like representation relative to other tasks like autoencoding, segmentation, and depth prediction. Self-supervised tasks (e.g. SimCLR, BarlowTwins, CLIP) showed comparable or improved brain-predictivity relative to architecture-matched supervised object categorization networks. In contrast, even extremely diverse architectures (e.g. CNNs, transformers, MLP-mixers), holding constant both the input and task of object categorization, showed little to no difference in brain predictivity. Notably, the analytical method employed (e.g. RSA with or without voxel-wise feature weighting) also had a dramatic impact on brain-predictivity magnitude. Each analysis method makes implicit theoretical commitments about the linking hypotheses between artificial neurons, voxel responses, and structure of population geometry, which warrant deeper consideration. Broadly, while these results provide a current snapshot of the best-fitting models of the human ventral visual stream, here we also offer controlled model comparison as a paradigm to advance our understanding the pressures guiding visual representation formation, with the aspiration of building increasingly stable insights with every highly-performant model that arrives on the scene.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

What can 5.17 billion regression fits tell us about the representational format of the high-level human visual system?

Abstract

Talk to us

Similar Papers

More From: Journal of Vision

Lead the way for us

Similar Papers

Global influences on the development of spatial and object perceptual categorization abilities: Evidence from preterm infants
Clay Mash ... Velma Dobson
Developmental Science | VOL. 1
Clay Mash, et. al.Clay Mash ... Velma Dobson
01 Apr 1998
Developmental Science | VOL. 1

Decision letter: Spatiotemporal neural dynamics of object recognition under uncertainty in humans
Zirui Huang ... Christian Büchel
-
Zirui Huang, et. al.Zirui Huang ... Christian Büchel
26 Jan 2023
26 Jan 2023

Viewer-Centered Object Representation in the Human Visual System Revealed by Viewpoint Aftereffects
Fang Fang ... Sheng He
Neuron | VOL. 45
Fang Fang, et. al.Fang Fang ... Sheng He
01 Mar 2005
Neuron | VOL. 45

Mental rotation and object categorization share a common network of prefrontal and dorsal and ventral regions of posterior cortex
Haline E Schendan ... Chantal E Stern
NeuroImage | VOL. 35
Haline E Schendan, et. al.Haline E Schendan ... Chantal E Stern
27 Jan 2007
NeuroImage | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What can 5.17 billion regression fits tell us about the representational format of the high-level human visual system?

Abstract

Talk to us

Similar Papers

More From: Journal of Vision