Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

Andrea De Cesarei,Shari Cavicchi,Marco Lippi,Giampaolo Cristadoro

doi:10.1111/cogs.13009

Abstract

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two‐class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image‐level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high‐passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.

Highlights

Making sense of the world and taking appropriate decisions is essential for adaptive behavior and survival
For low-passed pictures, all convolutional neural networks (CNNs) showed a psychometric function, which was shifted toward higher low-pass cutoffs compared with human participants, indicating that less degraded pictures were needed in order to achieve a good categorization performance
In the experiments with nonwhitened pictures examined here, we focus on the performance of CNNs, and observe that in most high spatial frequency (HSF) conditions, accuracy is considerably lower than that of humans, and that the difference compared to humans is larger for HSF than for low-pass filtering (LSF)

Summary

Introduction

Making sense of the world and taking appropriate decisions is essential for adaptive behavior and survival. Concerning vision, this means making sense of the light and shade which is projected onto the retina. How visual understanding is achieved is the object of study in diverse disciplines, such as general psychology, visual neuroscience, and computer science. These disciplines have shown a converging interest in artificial simulations of vision, namely deep convolutional neural networks (CNNs), which compete with humans in terms of capability to classify visual scenes (e.g., He, Zhang, Ren, & Sun, 2016).

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Cognitive Science	Publication Date: Jun 1, 2021
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cognitive Science

Lead the way for us

Similar Papers

A Bimodal Tuning Curve for Spatial Frequency Across Left and Right Human Orbital Frontal Cortex During Object Recognition
A. R. Fintzi ... B. Z. Mahon
Cerebral Cortex | VOL. 24
A. R. Fintzi, et. al.A. R. Fintzi ... B. Z. Mahon
10 Jan 2013
Cerebral Cortex | VOL. 24

Training for object recognition with increasing spatial frequency: A comparison of deep learning with human vision.
Lev Kiar Avberšek ... Astrid Zeman
Journal of Vision | VOL. 21
Lev Kiar Avberšek, et. al.Lev Kiar Avberšek ... Astrid Zeman
17 Sep 2021
Journal of Vision | VOL. 21

Socially anxious individuals discriminate better between angry and neutral faces, particularly when using low spatial frequency information
Oliver Langner ... Ad Van Knippenberg
Journal of Behavior Therapy and Experimental Psychiatry | VOL. 46
Oliver Langner, et. al.Oliver Langner ... Ad Van Knippenberg
24 Jul 2014
Journal of Behavior Therapy and Experimental Psychiatry | VOL. 46

Distinct role of spatial frequency in dissociative reading of ideograms and phonograms: An fMRI study
Shizuka Horie ... Satoru Miyauchi
NeuroImage | VOL. 63
Shizuka Horie, et. al.Shizuka Horie ... Satoru Miyauchi
25 Mar 2012
NeuroImage | VOL. 63

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cognitive Science