An ecologically motivated image dataset for deep learning yields better models of human vision

Johannes Mehrer,Courtney J Spoerer,Emer C Jones,Nikolaus Kriegeskorte,Tim C Kietzmann

doi:10.1073/pnas.2011417118

Abstract

Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.

Highlights

American television and film subtitles [10]] and concreteness ratings from human observers [11]
To quantify the agreement between representations found in deep neural networks (DNNs) and the brain, we use representational similarity analysis (RSA; 15), which characterizes a system’s population code by means of a representational dissimilarity matrix (RDM, correlation distance)
DNNs were shown the same stimuli as human observers (>1,200 images of various object categories), and the resulting network RDMs were compared to RDMs extracted from higher-level visual cortex (HVC) of individual human observers

Summary

Introduction

American television and film subtitles [10]] and concreteness ratings from human observers [11]. To test whether training DNNs on ecoset rather than ILSVRC 2012 might help to better explain cortical representations in human higher-visual cortex, we train various network instances on both ecoset and ILSVRC 2012 and compare their internal representations against data from two independent functional magnetic resonance imaging (fMRI) studies of human vision [12, 13] as well as human behavioral data [14]

Methods

Results

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Feb 15, 2021
Citations: 80	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An ecologically motivated image dataset for deep learning yields better models of human vision

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

Age-related Spike Timing Dependent Plasticity of Brain-inspired Model of Visual Information Processing with Reinforcement Learning
Petia Koprinkova-Hristova ... Nadejda Bocheva
-
Petia Koprinkova-Hristova, et. al.Petia Koprinkova-Hristova ... Nadejda Bocheva
26 Sep 2020
26 Sep 2020

Automated models of visual information processing
Mohylnyi Oleksandr
System technologies | VOL. 4
Mohylnyi OleksandrMohylnyi Oleksandr
13 Nov 2023
System technologies | VOL. 4

Factors affecting processing mode in visual search
Howard Egeth ... Grover Gilmore
Attention, Perception & Psychophysics | VOL. 13
Howard Egeth, et. al.Howard Egeth ... Grover Gilmore
01 Oct 1973
Attention, Perception & Psychophysics | VOL. 13

Mathematical Models of Visual Information Processing in the Human Brain and Applications to Visual Illusions and Image Processing
Hitoshi Arai
-
Hitoshi AraiHitoshi Arai
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An ecologically motivated image dataset for deep learning yields better models of human vision

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America