Abstract

Visual attention deployment mechanisms allow the Human Visual System to cope with an overwhelming amount of visual data by dedicating most of the processing power to objects of interest. The ability to automatically detect areas of the visual scene that will be attended to by humans is of interest for a large number of applications, from video coding, video quality assessment to scene understanding. Due to this fact, visual saliency (bottom-up attention) models have generated significant scientific interest in recent years. Most recent work in this area deals with dynamic models of attention that deal with moving stimuli (videos) instead of traditionally used still images. Visual saliency models are usually evaluated against ground-truth eye-tracking data collected from human subjects. However, there are precious few recently published approaches that try to learn saliency from eyetracking data and, to the best of our knowledge, no approaches that try to do so when dynamic saliency is concerned. The paper attempts to fill this gap and describes an approach to data-driven dynamic saliency model learning. A framework is proposed that enables the use of eye-tracking data to train an arbitrary machine learning algorithm, using arbitrary features derived from the scene. We evaluate the methodology using features from a state-of-the art dynamic saliency model and show how simple machine learning algorithms can be trained to distinguish between visually salient and non-salient parts of the scene.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call