Abstract
Humans are able to achieve visual object recognition rapidly and effortlessly. Object categorization is commonly believed to be achieved by interaction between bottom-up and top-down cognitive processing. In the ultra-rapid categorization scenario where the stimuli appear briefly and response time is limited, it is assumed that a first sweep of feedforward information is sufficient to discriminate whether or not an object is present in a scene. However, whether and how feedback/top-down processing is involved in such a brief duration remains an open question. To this end, here, we would like to examine how different top-down manipulations, such as category level, category type and real-world size, interact in ultra-rapid categorization. We have constructed a dataset comprising real-world scene images with a built-in measurement of target object display size. Based on this set of images, we have measured ultra-rapid object categorization performance by human subjects. Standard feedforward computational models representing scene features and a state-of-the-art object detection model were employed for auxiliary investigation. The results showed the influences from 1) animacy (animal, vehicle, food), 2) level of abstraction (people, sport), and 3) real-world size (four target size levels) on ultra-rapid categorization processes. This had an impact to support the involvement of top-down processing when rapidly categorizing certain objects, such as sport at a fine grained level. Our work on human vs. model comparisons also shed light on possible collaboration and integration of the two that may be of interest to both experimental and computational vision researches. All the collected images and behavioral data as well as code and models are publicly available at https://osf.io/mqwjz/.
Highlights
Visual recognition of objects by humans is often rapid and seemingly effortless [1,2,3]
This study investigated the influence of different 1) animacy, 2) level of abstraction, and 3) real-world size on ultra rapid object categorization
We show median reaction time (RT) distribution histograms for different categories in Fig 3, and conducted analyses for effect of level of abstraction and animacy
Summary
Visual recognition of objects by humans is often rapid and seemingly effortless [1,2,3]. Humans can accurately make judgments about briefly presented scenes, such as the presence of a target category and its referent location [1]. It is possible to reliably detect objects in the central visual field within a single fixation in less than 200 ms [3].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.