Abstract
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, “natural” images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled “natural” images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientist's “null” model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a “simpler” recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.
Highlights
Visual object recognition is an extremely difficult computational problem
The ease with which we recognize visual objects belies the computational difficulty of this feat
At the core of this challenge is image variation—any given object can cast an infinite number of different images onto the retina, depending on the object’s position, size, orientation, pose, lighting, etc
Summary
Visual object recognition is an extremely difficult computational problem. The core problem is that each object in the world can cast an infinite number of different 2-D images onto the retina as the object’s position, pose, lighting, and background vary relative to the viewer (e.g., [1]). Progress in understanding the brain’s solution to object recognition requires the construction of artificial recognition systems that aim to emulate our own visual abilities, often with biological inspiration (e.g., [2,3,4,5,6]) Such computational approaches are critically important because they can provide experimentally testable hypotheses, and because instantiation of a working recognition system represents a effective measure of success in understanding object recognition. Artificial systems should be able to do what our own visual systems can, but it is unclear how to evaluate progress toward this goal This amounts to choosing an image set against which to test performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.