Abstract

In this work we describe a general probabilistic framework that accounts for deployment of attention in complex natural images, scene recognition and object detection. The framework integrates into a unique model three factors for attention guidance (Torralba 2003; Oliva, Torralba, Castelhano, Henderson 2003): bottom-up saliency (based on low-level image properties), target driven search (model of the appearance of the target object) and global scene priors (top-down priors provided by the gist of the scene). Using a Bayesian framework, we show how to determine (1) which are the locations and scales in the image that are best candidates to contain objects of interest, (2) how the expected appearance of the target modulates the saliency of local image regions and, (3) scene priors: used to modulate the saliency of image regions early during the visual search task. The Bayesian model can be used to study different tasks by marginalizing with respect to the irrelevant variables: I) object detection (two forced choice task: is the target present in the image?) by marginalizing with respect to location. II) Object localization (visual search task), and III) Scene exploration (free viewing of an image) by marginalizing with respect to all the object classes. We study how each factor of the model (saliency, object model and scene priors) contributes to explaining subject performance in each task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call