Approximate number representations emerge in object-trained convolutional neural networks and show human-like signatures of number discrimination

Daniel Janini,Talia Konkle

doi:10.1167/jov.20.11.1120

Abstract

What are the visual input analyzers that yield numerosity representations? Recent work from Nasr et al. (2019) provides an interesting possibility: numerosity representations are implemented by the same cortical networks that can classify objects. They found that a convolutional neural network trained on object classification had units with tuning curves for numerosity, similar to neurons in primate parietal and frontal cortex. Here, we extend these findings, examining whether the neural network’s number representations are tolerant to stimulus variation and show signatures of human number perception. We recorded responses to dot displays in each unit of AlexNet trained on 1000-way object classification. A subset of units in AlexNet had gaussian tuning curves for number, with wider tuning curves for higher preferred numerosities. Tuning curves were stable across stimulus sets controlling for surface area, density, convex hull, total circumference, and dot radius. Extending previous findings, we also observed that the tuning curves were even maintained in textured dot displays (for example, fur-textured dots on grass-textured background). These results were replicated in another architecture (VGG16) and critically were not evident in an untrained network. Next, we tested whether AlexNet’s number tuning was susceptible to grouping effects similar to the human visual system. Both humans and AlexNet underestimated the numerosity of displays with dots grouped into pairs relative to displays with randomly arranged dots. We also created images in which lines connected pairs of dots, decreasing the number of continuous objects. Like humans, AlexNet’s number representations decreased as dots were connected. Altogether, these results indicate that neural networks trained on object recognition gain robust number representations. Moreover, these representations are influenced by spatial grouping and connectedness, matching properties of human behavior. These results support the view that the same input analyzers that untangle object categories from retinal input also yield approximate number representations.

Full Text