Abstract

Bayesian observer models have seen widespread success in predicting fixation locations for visual search tasks using artificial stimuli, such as Gabors in 1/f noise (Najemnik & Geisler, 2005), but comparatively little in predicting fixation locations for searches for natural categorical targets in real-world scenes. Critically, previous approaches have not accounted for the effects of foveation nor implemented decision rules to select a sequence of fixations. Here we present a Bayesian model of fixation selection in visual search tasks using natural images. The model used two known sources of information to select fixations: scene context and the spatial distribution of target-like features (the target-relevant feature distribution). Scene context functions as a prior over potential target locations, while the target-relevant feature distribution is used to compute the likelihood function. In line with previous approaches (Torralba, Oliva, Castelhano & Henderson, 2006; Ehinger, Hidalgo-Sotelo, Torralba & Oliva, 2009), we use GIST features to characterize scene context and we use Histograms of Oriented Gradients (Dalal & Triggs, 2005) to characterize target-relevant features. In addition, we account for the effects of foveation and combine these information sources with a sequential Bayesian updating framework (similar to Najemnik & Geisler, 2005). Using scene context or the target-relevant feature distribution alone, our model performs quite well (greater than 90% localization performance and greater than 95% classification performance, respectively). To compare our model's fixation selections, we tested human observers on a pedestrian search task in natural images. Before the search task, visibility maps were measured for each human observer. These visibility maps were used to degrade the target-relevant feature information in our model simulations, replicating the effects of foveation. We compare the model's selected fixations with those of human observers and discuss the implications regarding what information humans should use, and how they should combine it when selecting fixations. Meeting abstract presented at VSS 2018

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.