Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition.

Saeed Reza Kheradpisheh,Masoud Ghodrati,Mohammad Ganjtabesh,Timothée Masquelier

doi:10.1038/srep32672

Abstract

Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations.

Highlights

Primates excel at view-invariant object recognition[1]
This approach led to new findings: (1) Deeper was usually better and more human-like, but only in the presence of large variations; (2) Some Deep convolutional neural networks (DCNNs) reached human performance even with large variations; (3) Some DCNNs had error distributions which were indiscernible from those of humans; (4) Some DCNNs used representations that were more consistent with human responses, and these were not necessarily the top performers
We tested the DCNNs in our invariant object categorization task including five object categories, seven variation levels, and two background conditions

Summary

OPEN Deep Networks Can Resemble

Human Feed-forward Vision in Invariant Object Recognition received: 19 August 2015 accepted: 11 August 2016 Published: 07 September 2016. The advantages of our work with respect to previous studies are: (1) we used a larger object database, divided into five categories; (2) most importantly, we controlled and varied the magnitude of the variations in size, position, in-depth and in-plane rotations; (3) we benchmarked eight state-of-the-art DCNNs, the HMAX model[10] (an early biologically inspired shallow model), and a very simple shallow model that classifies directly from the pixel values (“Pixel”); (4) in our psychophysical experiments, the images were presented briefly and with backward masking, presumably blocking feedback; (5) we performed extensive comparisons between different layers of DCNNs and studied how invariance evolves through the layers; (6) we compared models and humans in terms of performance, error distributions, and representational geometry; and (7) to measure the influence of the background on the invariant object recognition problem our dataset included both segmented and unsegmented images. This approach led to new findings: (1) Deeper was usually better and more human-like, but only in the presence of large variations; (2) Some DCNNs reached human performance even with large variations; (3) Some DCNNs had error distributions which were indiscernible from those of humans; (4) Some DCNNs used representations that were more consistent with human responses, and these were not necessarily the top performers

Materials and Methods

Results

Author Contributions

Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Sep 7, 2016
Citations: 211	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Realizing Data Features by Deep Nets.
Zheng-Chu Guo ... Lei Shi
IEEE transactions on neural networks and learning systems | VOL. 31
Zheng-Chu Guo, et. al.Zheng-Chu Guo ... Lei Shi
05 Dec 2019
IEEE transactions on neural networks and learning systems | VOL. 31

Generalization and Expressivity for Deep Nets.
Shao-Bo Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 30
Shao-Bo LinShao-Bo Lin
27 Sep 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 30

Progressive elevations in AMPA and GABAA receptor levels in deafferented somatosensory cortex.
Hai‐Yan He ... Elizabeth M Quinlan
Journal of neurochemistry | VOL. 90
Hai‐Yan He, et. al.Hai‐Yan He ... Elizabeth M Quinlan
27 Jul 2004
Journal of neurochemistry | VOL. 90

Deep neural networks for rotation-invariance approximation and learning
Charles K. Chui ... Shao-Bo Lin
Analysis and Applications | VOL. 17
Charles K. Chui, et. al.Charles K. Chui ... Shao-Bo Lin
01 Sep 2019
Analysis and Applications | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports