The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.
Read full abstract