Abstract

• Attention guiding is significant for video saliency prediction based on 3D CNN. • A spatiotemporally attentive 3D CNN for robust video saliency prediction is proposed. • An adaptive upsampling module for refining spatial features is proposed. • A frame-wise attention module for propagating temporal features is proposed. • The effectiveness of the proposed method is comprehensively evaluated. 3D fully convolutional networks (FCN), which jointly leverage the spatial and temporal cues, have achieved great success in video saliency prediction. However, they still have limitations in some challenging cases, e.g. fixation shift. To address this issue, we propose a SpatioTemporally Attentive 3D Network (STA3D) to selectively propagate the significant temporal features and refine the spatial features in 3D FCN for video saliency prediction. Extensive experiments on three standard datasets demonstrate the superiority of the proposed model against the state-of-the-art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call