STA3D: Spatiotemporally attentive 3D network for video saliency prediction

Wenbin Zou,Shengkai Zhuo,Xia Li,Chen Xu,Yi Tang,Shishun Tian

doi:10.1016/j.patrec.2021.04.010

Wenbin Zou, Shengkai Zhuo + Show 4 more

https://doi.org/10.1016/j.patrec.2021.04.010

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

• Attention guiding is significant for video saliency prediction based on 3D CNN. • A spatiotemporally attentive 3D CNN for robust video saliency prediction is proposed. • An adaptive upsampling module for refining spatial features is proposed. • A frame-wise attention module for propagating temporal features is proposed. • The effectiveness of the proposed method is comprehensively evaluated. 3D fully convolutional networks (FCN), which jointly leverage the spatial and temporal cues, have achieved great success in video saliency prediction. However, they still have limitations in some challenging cases, e.g. fixation shift. To address this issue, we propose a SpatioTemporally Attentive 3D Network (STA3D) to selectively propagate the significant temporal features and refine the spatial features in 3D FCN for video saliency prediction. Extensive experiments on three standard datasets demonstrate the superiority of the proposed model against the state-of-the-art.

Full Text