Exploring perceptual illusions in deep neural networks

Emily J Ward

doi:10.1167/19.10.34b

Abstract

Perceptual illusions -- discrepancies between what exists externally and what we actually see -- tell us a great deal about how the perceptual system functions. Rather than failures of perception, illusions reveal automatic computations and biases in visual processing that help make better decisions from visual information. Recently, deep convolutional neural networks (DCNNs) have been very successful in a variety of complex visual tasks, such as object recognition. This success has inspired researchers to begin comparing internal visual representations of DCNNs to those of humans, and in many respects, these representations turn out to be similar, raising the question of whether DCNNs “experience” some of the same illusions that people do. To investigate this, I presented a DCNN trained for object classification (VGG16, trained on imagenet) with several standard illusions, including the Muller-Lyer illusion and examples of amodal completion. Instead of using object classification, I assessed how the DCNN “perceived” the illusions by computing the similarity between the layer activation response to the ambiguous form of the illusion and to several alternate, disambiguated forms (for example, comparing the response to an occluded shape to the response to the full shape vs. to an incomplete, notched shape). For the Muller-Lyer illusion, in all convolutional layers, the response to lines with two outward tails was more similar to objectively longer lines, compared to shorter lines (p< 0.001), consistent with human perception. However, the response to lines with two inward heads was also more similar to longer lines (p< 0.001), inconsistent with human perception. For amodal completion, the response to occluded shapes was consistently more similar to the incomplete, notched shapes (p=0.003), even when the shapes were real objects. These results suggest that despite human-level performance of DCNNs on object recognition, these networks do not demonstrate some of the fundamental behavior of the visual system.

Full Text