Abstract

Reflectance, lighting and geometry combine in complex ways to create images. How do we disentangle these to perceive individual properties, such as surface glossiness? We suggest that brains disentangle properties by learning to model statistical structure in proximal images. To test this hypothesis, we trained unsupervised generative neural networks on renderings of glossy surfaces and compared their representations with human gloss judgements. The networks spontaneously cluster images according to distal properties such as reflectance and illumination, despite receiving no explicit information about these properties. Intriguingly, the resulting representations also predict the specific patterns of ‘successes’ and ‘errors’ in human perception. Linearly decoding specular reflectance from the model’s internal code predicts human gloss perception better than ground truth, supervised networks or control models, and it predicts, on an image-by-image basis, illusions of gloss perception caused by interactions between material, shape and lighting. Unsupervised learning may underlie many perceptual dimensions in vision and beyond.

Highlights

  • IntroductionWe suggest that brains disentangle properties by learning to model statistical structure in proximal images

  • Reflectance, lighting and geometry combine in complex ways to create images

  • Our findings show that mid-level perceptual dimensions, such as gloss— which imperfectly map onto properties of the physical world—can emerge spontaneously by learning to efficiently encode images

Read more

Summary

Introduction

We suggest that brains disentangle properties by learning to model statistical structure in proximal images To test this hypothesis, we trained unsupervised generative neural networks on renderings of glossy surfaces and compared their representations with human gloss judgements. We show that by learning to efficiently compress and spatially predict images of surfaces, an unsupervised generative deep neural network (DNN) spontaneously clusters inputs by distal factors such as material and illumination and strikingly reproduces many characteristic ‘misperceptions’ of human observers. Identifying and disentangling these distal sources is possible only if a second principle holds: different distal sources must generate statistically distinguishable effects in the proximal input[18] This seems intuitively true—for example, changes in illumination generate different kinds or patterns of variability in images than changes in surface material do. On the basis of these two principles, we reasoned that it should be possible for a sufficiently powerful statistical learning model to discover the existence of distal variables without a priori knowledge of either the number or kinds of distal variables that exist in the world, solely on the basis of the variability they generate in images

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call