Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Ziqiang Wang,Jijun Wang,Yang Wang,Gongyang Li,Lihua Xu,Zhi Liu,Tianhong Zhang

doi:10.1109/tmm.2021.3139743

Ziqiang Wang, Jijun Wang + Show 5 more

Open Access

https://doi.org/10.1109/tmm.2021.3139743

Copy DOI

Abstract

3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local spacetime according to its kernel size, while human attention is always attracted by relational visual features at different time. To overcome this limitation, we propose a novel Spatio-Temporal Self-Attention 3D Network (STSANet) for video saliency prediction, in which multiple Spatio-Temporal Self-Attention (STSA) modules are employed at different levels of 3D convolutional backbone to directly capture long-range relations between spatio-temporal features of different time steps. Besides, we propose an Attentional Multi-Scale Fusion (AMSF) module to integrate multi-level features with the perception of context in semantic and spatio-temporal subspaces. Extensive experiments demonstrate the contributions of key components of our method, and the results on DHF1K, Hollywood-2, UCF, and DIEM benchmark datasets clearly prove the superiority of the proposed model compared with all state-of-the-art models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Multimedia	Publication Date: Jan 1, 2023
Citations: 21	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Similar Papers

A Depthwise Separable Network for Action Recognition
Shengwei Zhou ... Yang Yang
DEStech Transactions on Computer Science and Engineering | VOL. -
Shengwei Zhou, et. al.Shengwei Zhou ... Yang Yang
09 Dec 2019
DEStech Transactions on Computer Science and Engineering | VOL. -

Классификация гиперспектральных данных дистанционного зондирования Земли с использованием комбинированных 3D--2D сверточных нейронных сетей
L.T Nyan ... M.T Do
Herald of the Bauman Moscow State Technical University. Series Instrument Engineering | VOL. -
L.T Nyan, et. al.L.T Nyan ... M.T Do
01 Mar 2022
Herald of the Bauman Moscow State Technical University. Series Instrument Engineering | VOL. -

Identification of Diseased Apple Fruit using 3D and 2D Convolutional Neural Network for Improving Accuracy
K Arunteja ... Amudha
-
K Arunteja, et. al.K Arunteja ... Amudha
12 Nov 2022
12 Nov 2022

Understanding More About Human and Machine Attention in Deep Neural Networks
Qiuxia Lai ... Yongwei Nie
IEEE Transactions on Multimedia | VOL. 23
Qiuxia Lai, et. al.Qiuxia Lai ... Yongwei Nie
06 Jul 2020
IEEE Transactions on Multimedia | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia