Abstract
During free-viewing of natural scenes, eye movements are guided by bottom-up factors inherent to the stimulus, as well as top-down factors inherent to the observer. The question of how these two different sources of information interact and contribute to fixation behavior has recently received a lot of attention. Here, a battery of 15 visual stimulus features was used to quantify the contribution of stimulus properties during free-viewing of 4 different categories of images (Natural, Urban, Fractal and Pink Noise). Behaviorally relevant information was estimated in the form of topographical interestingness maps by asking an independent set of subjects to click at image regions that they subjectively found most interesting. Using a Bayesian scheme, we computed saliency functions that described the probability of a given feature to be fixated. In the case of stimulus features, the precise shape of the saliency functions was strongly dependent upon image category and overall the saliency associated with these features was generally weak. When testing multiple features jointly, a linear additive integration model of individual saliencies performed satisfactorily. We found that the saliency associated with interesting locations was much higher than any low-level image feature and any pair-wise combination thereof. Furthermore, the low-level image features were found to be maximally salient at those locations that had already high interestingness ratings. Temporal analysis showed that regions with high interestingness ratings were fixated as early as the third fixation following stimulus onset. Paralleling these findings, fixation durations were found to be dependent mainly on interestingness ratings and to a lesser extent on the low-level image features. Our results suggest that both low- and high-level sources of information play a significant role during exploration of complex scenes with behaviorally relevant information being more effective compared to stimulus features.
Highlights
The allocation of attention under natural viewing conditions is a complex phenomenon requiring the concerted activity of multiple neuronal levels, mobilizing a huge number of sensory and motor areas as well as subcortical structures
Using a Bayesian framework, we quantified to what extent eye movements made by human subjects correlate with low-level image characteristics that are presumably extracted during sensory processing in the brain
Among the three features that are sensitive to different kinds of symmetrical configurations (Fig. 3D), we found that Radial Symmetry features were more salient than Bilateral Symmetry features and fixation points were preferentially located at image locations that had a high radial symmetrical configuration
Summary
The allocation of attention under natural viewing conditions is a complex phenomenon requiring the concerted activity of multiple neuronal levels, mobilizing a huge number of sensory and motor areas as well as subcortical structures. Several independent factors operating in parallel interact and add considerable complexity to the study and generation of eye movements under natural conditions. These include stimulus properties, the relevance of the information for the human observer and geometrical aspects [5]. The first conceptualization, namely bottom-up or stimulus-dependent vision, exclusively considers the information content embedded in the stimulus itself This typically spans a large spectrum, covering local features from a simple (such as orientation, luminance contrast, disparity) to complex level (faces, cars, objects, body parts), and more distributed features such as symmetry and arrangement of objects. This wide spectrum can be roughly divided into low-, mid- and high-level information, reflecting roughly the cortical hierarchical organization from primary visual cortex to higher visual areas
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have