Abstract

Image saliency detection has been widely explored in recent decades, but computational modeling of visual attention for video sequences is limited due to complicated temporal saliency extraction and fusion of spatial and temporal saliency. Inspired by Gestalt theory, we introduce a novel spatiotemporal saliency detection model in this study. First, we compute spatial and temporal saliency maps by low-level visual features. And then we merge these two saliency maps for spatiotemporal saliency prediction of video sequences. The spatial saliency map is calculated by extracting three kinds of features including color, luminance, and texture, while the temporal saliency map is computed by extracting motion features estimated from video sequences. A novel adaptive entropy-based uncertainty weighting method is designed to fuse spatial and temporal saliency maps to predict the final spatiotemporal saliency map by Gestalt theory. The Gestalt principle of similarity is used to estimate spatial uncertainty from spatial saliency, while temporal uncertainty is computed from temporal saliency by the Gestalt principle of common fate. Experimental results on three large-scale databases show that our method can predict visual saliency more accurately than the state-of-art spatiotemporal saliency detection algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call