Abstract

Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.

Highlights

  • The visual system is often seen as a hierarchy of local feedforward computations [1], going back to the seminal work of Hubel and Wiesel [2]

  • We show that Capsule Neural Networks (CapsNets), combining Feedforward Convolutional Neural Networks (ffCNNs) with recurrent grouping and segmentation, solve this challenge

  • We show that Capsule Neural Networks (CapsNets; Fig 1. (a) Crowding: The perception of visual elements deteriorates in clutter, an effect called crowding

Read more

Summary

Introduction

The visual system is often seen as a hierarchy of local feedforward computations [1], going back to the seminal work of Hubel and Wiesel [2]. It was suggested that ffCNNs mainly focus on local, texture-like features, while humans harness global shape computations ([9,13,14,15,16,17]; but see [18]). In this context, it was shown that changing local features of an object, such as its texture or edges, leads ffCNNs to misclassify [13,14], while humans can still classify the object based on its global shape

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call