Abstract

Accurately modeling and predicting the visual attention behavior of human viewers can help a video analysis algorithm find interesting regions by reducing the search effort of tasks, such as object detection and recognition. In recent years, a great number and variety of visual attention models for predicting the direction of gaze on images and videos have been proposed. When a human views video, the motions of both objects in the video and of the camera greatly affect the distribution of visual fixations. Here we develop models that lead to motion features that are extracted from videos and used in a new video saliency detection method called spatial–temporal weighted dissimilarity (STWD). To achieve efficiency, frames are partitioned into blocks on which saliency calculations are made. Two spatial features are defined on each block, termed spatial dissimilarity and preference difference, which are used to characterize the spatial conspicuity of each block. The motion features extracted from each block are simple differences of motion vectors between adjacent frames. Finally, the spatial and motion features are used to generate a saliency map on each frame. Experiments on three public video datasets containing 185 video clips and corresponding eye traces revealed that the proposed saliency detection method is highly competitive with, and delivers better performance than state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call