Abstract

Humans can categorize objects in complex natural scenes within 100–150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ∼6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization.To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (∼100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes.

Highlights

  • Humans can remember extraordinarily rich details in thousands of scenes viewed for a very brief period [1]

  • We developed a hierarchical probabilistic model (Figure 1)

  • An object is conceived to consist of multiple parts and each part consist of a set of natural object structures, each of which is a concatenation of local features in a small region of the object

Read more

Summary

Introduction

Humans can remember extraordinarily rich details in thousands of scenes viewed for a very brief period [1]. Humans can grasp the gist of complex natural scenes very quickly [2,3,4]. This is often called rapid scene categorization since it requires little or no attention and top-down feedback plays a limited role. This amazing ability challenges the traditional view of visual information processing in several major ways. Low-level visual features including edges, junctions, and various image gradients are insufficient for revealing the content of complex natural scenes. The computation needed to build such symbolic representations seems too timeconsuming for rapid scene categorization

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call