A self-supervised domain-general learning framework for human ventral stream representation

Talia Konkle,George A Alvarez

doi:10.1038/s41467-022-28091-4

Abstract

Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to represent individual images, rather than categories, such that views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find that category information implicitly emerges in the local similarity structure of this feature space. Further, these models learn hierarchical features which capture the structure of brain responses across the human ventral visual stream, on par with category-supervised models. These results provide computational support for a domain-general framework guiding the formation of visual representation, where the proximate goal is not explicitly about category information, but is instead to learn unique, compressed descriptions of the visual world.

Highlights

Anterior regions of the ventral visual stream encode substantial information about object categories
In what has sometimes been taken as converging support for the role of category-level pressures in forming visual representations, deep convolutional neural network models—trained directly to support object categorization—develop hierarchical feature spaces that show an emergent match with brain responses[17,18,19,20,21,22,23,24]
On deeper examination, it is not clear that the category-level supervisory signals involved in training deep neural networks are a good proxy for the representational pressures implied in the domainlevel cognitive neuroscience theories. These deep neural networks models are trained with much finer-grained distinctions at the subordinate category level

Summary

Introduction

Anterior regions of the ventral visual stream encode substantial information about object categories. Alternate theories of visual representation formation put relatively more weight on the structure of the natural image statistics, and relatively less weight on downstream output needs driving visual representation formation[9,10,12,28,29] These theoretical proposals argue that the visual cortex is a generic covariance extractor and that there are systematic differences in the way things look (e.g., among faces and scenes; among animals, big objects, and small objects)–it is these perceptual feature differences that underlie the ‘categorical’ distinctions of highlevel visual responses[28,30,31]. A key challenge remains to make this domain-general proposal more computationally explicit: what is an alternative representation goal, if not category-supervision, that might serve as an internal learning signal to draw out useful structure from natural image statistics?

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Jan 25, 2022
Citations: 50	License type: open-access

R Discovery Prime

R Discovery Prime

A self-supervised domain-general learning framework for human ventral stream representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence
Martin Schrimpf ... James J Dicarlo
Neuron | VOL. 108
Martin Schrimpf, et. al.Martin Schrimpf ... James J Dicarlo
11 Sep 2020
Neuron | VOL. 108

Neuronal tuning and population representations of shape and category in human visual cortex
Vasiliki Bougou ... Tom Theys
Nature Communications | VOL. 15
Vasiliki Bougou, et. al.Vasiliki Bougou ... Tom Theys
30 May 2024
Nature Communications | VOL. 15

An image-dependent representation of familiar and unfamiliar faces in the human ventral stream
Jodie Davies-Thompson ... Timothy J Andrews
Neuropsychologia | VOL. 47
Jodie Davies-Thompson, et. al.Jodie Davies-Thompson ... Timothy J Andrews
19 Jan 2009
Neuropsychologia | VOL. 47

Decision letter: Spatiotemporal neural dynamics of object recognition under uncertainty in humans
Zirui Huang ... Christian Büchel
-
Zirui Huang, et. al.Zirui Huang ... Christian Büchel
26 Jan 2023
26 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A self-supervised domain-general learning framework for human ventral stream representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications