Abstract

Experimental studies have revealed evidence of both parts-based and holistic representations of objects and faces in the primate visual system. However, it is still a mystery how such seemingly contradictory types of processing can coexist within a single system. Here, we propose a novel theory called mixture of sparse coding models, inspired by the formation of category-specific subregions in the inferotemporal (IT) cortex. We developed a hierarchical network that constructed a mixture of two sparse coding submodels on top of a simple Gabor analysis. The submodels were each trained with face or non-face object images, which resulted in separate representations of facial parts and object parts. Importantly, evoked neural activities were modeled by Bayesian inference, which had a top-down explaining-away effect that enabled recognition of an individual part to depend strongly on the category of the whole input. We show that this explaining-away effect was indeed crucial for the units in the face submodel to exhibit significant selectivity to face images over object images in a similar way to actual face-selective neurons in the macaque IT cortex. Furthermore, the model explained, qualitatively and quantitatively, several tuning properties to facial features found in the middle patch of face processing in IT as documented by Freiwald, Tsao, and Livingstone (2009). These included, in particular, tuning to only a small number of facial features that were often related to geometrically large parts like face outline and hair, preference and anti-preference of extreme facial features (e.g., very large/small inter-eye distance), and reduction of the gain of feature tuning for partial face stimuli compared to whole face stimuli. Thus, we hypothesize that the coding principle of facial features in the middle patch of face processing in the macaque IT cortex may be closely related to mixture of sparse coding models.

Highlights

  • The variety of objects that we see everyday is overwhelming and how our visual system deals with such complexity is a long-standing problem

  • Mixture of sparse coding models and face neurons funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

  • Classical psychology has often debated on whether an object is represented as a combination of individual parts or as a whole [1]

Read more

Summary

Introduction

The variety of objects that we see everyday is overwhelming and how our visual system deals with such complexity is a long-standing problem. Experimental studies have revealed evidence of both types of processing in behaviors [1, 2] and in neural activities in higher visual areas [2,3,4,5], somewhat favoring holistic representation for faces and parts-based representation for nonface objects [1, 2, 5]. We found that this framework gave rise to a form of holistic computation: recognition of the whole object depends on the individual parts, and recognition of a part depends on the whole This is a Bayesian explaining-away effect: an input image is first independently interpreted by each sparse coding submodel, but the one offering the better interpretation is adopted and the other is dismissed. Even if a part of an input image is a potential facial feature (e.g., a half-moon-like shape ), that feature would not be recognized as an actual facial feature (e.g., a mouth) if the whole image is a non-face object (Fig 1B)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call