Scale and translation-invariance for novel objects in human vision

Yena Han,Gemma Roig,Gad Geiger,Tomaso Poggio

doi:10.1038/s41598-019-57261-6

Yena Han, Gemma Roig + Show 2 more

Open Access

PDF Available

https://doi.org/10.1038/s41598-019-57261-6

Copy DOI

Export

Save

Cite

Journal: Scientific reports	Publication Date: Jan 29, 2020
Citations: 33	License type: open-access

Affiliation: Goethe University Frankfurt

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.

Highlights

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive
Findings from previous studies range from “This result suggests that the visual system does not apply a global transposition transformation to the retinal image to compensate for translations”9. to “For animal-like shapes, we found complete translation invariance”[10], and to “Our results demonstrate that position invariance, a widely acknowledged property of the human visual system, is limited to specific experimental conditions”[11]
For the one-shot learning task, we flashed a target Korean letter and a test Korean letter, which was either the same as the target or a different distractor, to non-Korean subjects who were unfamiliar with Korean letters

Summary

Introduction

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. It is important to distinguish between invariance due to the underlying representation, which we refer to as intrinsic invariance, and example-based invariance for familiar objects that have been previously seen under several different viewpoints. The latter is computationally trivial and is available to any recognition system with sufficient memory and large training data. The extent of intrinsic invariant recognition is still unknown (see[16,17,18] for studies on primate invariant recognition and[19] for human invariant recognition of familiar objects)

Methods

Results

Conclusion