Abstract

Attention models for free-viewing of images are commonly developed in a bottom-up (BU) manner. However, in this paper, we propose to include memory-oriented top-down spatial attention cues into the model. The proposed generative saliency model probabilistically combines a BU module and a top-down (TD) module and can be applied to both static and dynamic scenes. For static scenes, the experience of attention distribution to similar scenes in long-term memory is mimicked by linearly mapping a global feature of the scene to the long-term top-down (LTD) saliency. And for dynamic scenes, we add on the influence of short-term memory to form a HMM-like chain to guide the attention distribution. Our improved BU module utilizes low-level feature contrast, spatial distribution and location information to highlight saliency regions. It is tested on 1000 benchmark images and outperforms state-of-the-art BU methods. The complete model is examined on two video datasets, one with manually labeled saliency regions, and another with recorded eye-movement fixations. Experimental results show that our model achieves significant improvement in predicting human visual attention compared with existing saliency models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call