Local features and global shape information in object classification by deep convolutional neural networks

Nicholas Baker,Hongjing Lu,Gennady Erlikhman,Philip J Kellman

doi:10.1016/j.visres.2020.04.003

Nicholas Baker, Hongjing Lu + Show 2 more

Open Access

https://doi.org/10.1016/j.visres.2020.04.003

Copy DOI

Journal: Vision research	Publication Date: May 12, 2020
Citations: 60	License type: publisher-specific-oa

Affiliation: University of California, Los Angeles

Abstract

Deep convolutional neural networks (DCNNs) show impressive similarities to the human visual system. Recent research, however, suggests that DCNNs have limitations in recognizing objects by their shape. We tested the hypothesis that DCNNs are sensitive to an object’s local contour features but have no access to global shape information that predominates human object recognition. We employed transfer learning to assess local and global shape processing in trained networks. In Experiment 1, we used restricted and unrestricted transfer learning to retrain AlexNet, VGG-19, and ResNet-50 to classify circles and squares. We then probed these networks with stimuli with conflicting global shape and local contour information. We presented networks with overall square shapes comprised of curved elements and circles comprised of corner elements. Networks classified the test stimuli by local contour features rather than global shapes. In Experiment 2, we changed the training data to include circles and squares comprised of different elements so that the local contour features of the object were uninformative. This considerably increased the network’s tendency to produce global shape responses, but deeper analyses in Experiment 3 revealed the network still showed no sensitivity to the spatial configuration of local elements. These findings demonstrate that DCNNs’ performance is an inversion of human performance with respect to global and local shape processing. Whereas abstract relations of elements predominate in human perception of shape, DCNNs appear to extract only local contour fragments, with no representation of how they spatially relate to each other to form global shapes.

Full Text