Abstract

The development of deep convolutional neural networks (CNNs) has recently led to great successes in computer vision, and CNNs have become de facto computational models of vision. However, a growing body of work suggests that they exhibit critical limitations on tasks beyond image categorization. Here, we study one such fundamental limitation, concerning the judgment of whether two simultaneously presented items are the same or different (SD) compared with a baseline assessment of their spatial relationship (SR). In both human subjects and artificial neural networks, we test the prediction that SD tasks recruit additional cortical mechanisms which underlie critical aspects of visual cognition that are not explained by current computational models. We thus recorded electroencephalography (EEG) signals from human participants engaged in the same tasks as the computational models. Importantly, in humans the two tasks were matched in terms of difficulty by an adaptive psychometric procedure; yet, on top of a modulation of evoked potentials (EPs), our results revealed higher activity in the low β (16–24 Hz) band in the SD compared with the SR conditions. We surmise that these oscillations reflect the crucial involvement of additional mechanisms, such as working memory and attention, which are missing in current feed-forward CNNs.

Highlights

  • Our results suggest a significant difference, both in evoked potentials (EPs) and in the oscillatory dynamics, of the EEG signals measured from human participants performing these two tasks

  • The field of artificial vision witnessed an impressive boost in the last few years, driven by the striking results of deep convolutional neural networks (CNNs)

  • The effortless ability of humans and other animals (Wasserman et al, 2012; Daniel et al, 2015) to learn SD tasks suggest the possible involvement of additional computations that are lacking in CNNs, possibly achieving items identification or segmentation

Read more

Summary

Introduction

The field of artificial vision witnessed an impressive boost in the last few years, driven by the striking results of deep convolutional neural networks (CNNs). Such hierarchical neural networks process information sequentially, through a feedforward cascade of filtering, rectification, and normalization operations. The accuracy of these architectures is approaching, sometimes exceeding, that of human observers on key visual recognition tasks including object (He et al, 2016) and face recognition (Phillips et al, 2018). Kim and colleagues demonstrated that CNNs can more learn the first class of problems compared with the second one

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.