Abstract
Dynamic videos are viewed fundamentally different from static images. Besides spatial features, motion feature also plays an important role as a temporal factor. Most existing video saliency models usually employ optical flow to represent the motion feature. However, optical flow often suffers from the discontinuity problem. And we also notice that human fixations in one single video frame are much sparser than that in an identical still picture. However, many spatial saliency models take each video frame as static image independently. In this paper, we predict the dynamic visual saliency by fusing spatial and temporal features. In order to construct the temporal relationships among a set of successive frames, we introduce a smoothness operator in optical flow field to obtain more accurate motion feature. Then, considering the sparse property of video saliency, we adapt the weights of the regions surrounding to the saliency gravity center in the final maps. The experiments show that our model is more consistent with humans eye-tracking benchmarks than the state-of-the-art models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.