Representations of regular and irregular shapes by deep Convolutional Neural Networks, monkey inferotemporal neurons and human judgments.

Ioannis Kalfas,Kasper Vinken,Rufin Vogels,Nikolaus Kriegeskorte

doi:10.1371/journal.pcbi.1006557

Ioannis Kalfas, Kasper Vinken + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1006557

Copy DOI

Abstract

Recent studies suggest that deep Convolutional Neural Network (CNN) models show higher representational similarity, compared to any other existing object recognition models, with macaque inferior temporal (IT) cortical responses, human ventral stream fMRI activations and human object recognition. These studies employed natural images of objects. A long research tradition employed abstract shapes to probe the selectivity of IT neurons. If CNN models provide a realistic model of IT responses, then they should capture the IT selectivity for such shapes. Here, we compare the activations of CNN units to a stimulus set of 2D regular and irregular shapes with the response selectivity of macaque IT neurons and with human similarity judgements. The shape set consisted of regular shapes that differed in nonaccidental properties, and irregular, asymmetrical shapes with curved or straight boundaries. We found that deep CNNs (Alexnet, VGG-16 and VGG-19) that were trained to classify natural images show response modulations to these shapes that were similar to those of IT neurons. Untrained CNNs with the same architecture than trained CNNs, but with random weights, demonstrated a poorer similarity than CNNs trained in classification. The difference between the trained and untrained CNNs emerged at the deep convolutional layers, where the similarity between the shape-related response modulations of IT neurons and the trained CNNs was high. Unlike IT neurons, human similarity judgements of the same shapes correlated best with the last layers of the trained CNNs. In particular, these deepest layers showed an enhanced sensitivity for straight versus curved irregular shapes, similar to that shown in human shape judgments. In conclusion, the representations of abstract shape similarity are highly comparable between macaque IT neurons and deep convolutional layers of CNNs that were trained to classify natural images, while human shape similarity judgments correlate better with the deepest layers.

Highlights

Several studies compared the representations of visual images in deep Convolutional Neural Networks (CNN) with those of biological systems, such as the primate ventral visual stream [1,2,3,4]
The primate inferior temporal (IT) cortex is considered to be the final stage of visual processing that allows for object recognition, identification and categorization of objects
We examine whether deep Convolutional Neural Networks (CNNs), that were trained to classify natural images of objects, show response modulations for abstract shapes similar to those of macaque IT neurons

Summary

Introduction

Several studies compared the representations of visual images in deep Convolutional Neural Networks (CNN) with those of biological systems, such as the primate ventral visual stream [1,2,3,4]. It is impossible to predict from existing studies that compared deep CNN activations and neurophysiology whether the deep CNNs, which are trained with natural images, can faithfully model the selectivity of IT neurons for two-dimensional abstract shapes. Such correspondence between CNN models and single unit selectivity for abstract shapes is critical for assessing the generalizability of CNN models to stimuli that differ markedly from those of the trained task but have been shown to drive selectively IT neurons

Methods

Results

Discussion

Conclusion