Abstract

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two‐class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image‐level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high‐passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.

Highlights

  • Making sense of the world and taking appropriate decisions is essential for adaptive behavior and survival

  • For low-passed pictures, all convolutional neural networks (CNNs) showed a psychometric function, which was shifted toward higher low-pass cutoffs compared with human participants, indicating that less degraded pictures were needed in order to achieve a good categorization performance

  • In the experiments with nonwhitened pictures examined here, we focus on the performance of CNNs, and observe that in most high spatial frequency (HSF) conditions, accuracy is considerably lower than that of humans, and that the difference compared to humans is larger for HSF than for low-pass filtering (LSF)

Read more

Summary

Introduction

Making sense of the world and taking appropriate decisions is essential for adaptive behavior and survival. Concerning vision, this means making sense of the light and shade which is projected onto the retina. How visual understanding is achieved is the object of study in diverse disciplines, such as general psychology, visual neuroscience, and computer science. These disciplines have shown a converging interest in artificial simulations of vision, namely deep convolutional neural networks (CNNs), which compete with humans in terms of capability to classify visual scenes (e.g., He, Zhang, Ren, & Sun, 2016).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call