Abstract

In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Highlights

  • In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream

  • If the computations employed in biological sensory systems resemble those employed by this class of deep generative model to disentangle the visual world, the tuning properties of single neurons should map readily onto the meaningful latent units discovered by the β-variational autoencoder (VAE)

  • We first investigated whether the variation in average spike rates of any of the individual recorded neurons was explained by the activity in single units of a trained β-VAE that learnt to “disentangle” the same face dataset that was presented to the primates

Read more

Summary

Introduction

In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. Recent advances in machine learning, have offered an implementational blueprint for this theory with the advent of deep self-supervised generative models that learn to “disentangle” high-dimensional sensory signals into meaningful factors of variation One such model, known as the beta-variational autoencoder (β-VAE), learns to faithfully reconstruct sensory data from a low-dimensional embedding whilst being regularised in a way that encourages individual network units to code for semantically meaningful variables, such as the colour of an object, the gender of a face, or the arrangement of a scene (Fig. 1a–c)[17,18,19]. Β-VAE learns using a general self-supervised objective without relying on high-density teaching signals like deep classifiers, which makes it more biologically plausible

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call