Video Saliency Forecasting Transformer

Cheng Ma,Yongming Rao,Jie Zhou,Jiwen Lu,Haowen Sun

doi:10.1109/tcsvt.2022.3172971

Abstract

Video saliency prediction (VSP) aims to imitate eye fixations of humans. However, the potential of this task has not been fully exploited since existing VSP methods only focus on modeling visual saliency of the input previous frames. In this paper, we present the first attempt to extend this task to video saliency forecasting (VSF) by forecasting attention regions of consecutive future frames. To tackle this problem, we propose a video saliency forecasting transformer (VSFT) network built on a new encoder-decoder architecture. Different from existing VSP methods, our VSFT is the first pure-transformer based architecture in the VSP field and is freed from the dependency of the pretrained S3D model. In VSFT, the attention mechanism is exploited to capture spatial-temporal dependencies between the input past frames and the target future frame. We propose cross-attention guidance blocks (CAGB) to aggregate multi-level representation features to provide sufficient guidance for forecasting. We conduct comprehensive experiments on two benchmark datasets, DHF1K and Hollywoods-2. We investigate the saliency forecasting and predicting abilities of existing VSP methods by modifying the supervision signals. Experimental results demonstrate that our method achieves superior performance on both VSF and VSP tasks.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Video Saliency Forecasting Transformer

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Oct 1, 2022
Citations: 22

Similar Papers

Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network
Kao Zhang ... Zhenzhong Chen
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 29
Kao Zhang, et. al.Kao Zhang ... Zhenzhong Chen
01 Dec 2019
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 29

Transformer-Based Multi-Scale Feature Integration Network for Video Saliency Prediction
Xiaofei Zhou ... Chenggang Yan
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 33
Xiaofei Zhou, et. al.Xiaofei Zhou ... Chenggang Yan
01 Dec 2023
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 33

Camera-Assisted Video Saliency Prediction and Its Applications.
Xiao Sun ... Ping Li
IEEE transactions on cybernetics | VOL. 48
Xiao Sun, et. al.Xiao Sun ... Ping Li
21 Dec 2017
IEEE transactions on cybernetics | VOL. 48

A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction.
Kao Zhang ... Zhenzhong Chen
IEEE Transactions on Image Processing | VOL. 30
Kao Zhang, et. al.Kao Zhang ... Zhenzhong Chen
18 Nov 2020
IEEE Transactions on Image Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video Saliency Forecasting Transformer

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society