ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

Nikolaos Gkalelis,Dimitrios Daskalakis,Vasileios Mezaris

doi:10.1109/access.2022.3213652

Nikolaos Gkalelis, Dimitrios Daskalakis + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3213652

Copy DOI

Abstract

In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is proposed. The ViGAT head consists of graph attention network (GAT) blocks factorized along the spatial and temporal dimensions in order to capture effectively both local and long-term dependencies between objects or frames. Moreover, using the weighted in-degrees (WiDs) derived from the adjacency matrices at the various GAT blocks, we show that the proposed architecture can identify the most salient objects and frames that explain the decision of the network. A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets (FCVID, Mini-Kinetics, ActivityNet).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Key-Frame based Event Recognition in Unconstrained Videos using Temporal Features
Prithwish Jana ... Partha Pratim Mohanta
-
Prithwish Jana, et. al.Prithwish Jana ... Partha Pratim Mohanta
01 Jun 2019
01 Jun 2019

Event Recognition in Videos by Learning from Heterogeneous Web Sources
Lin Chen ... Dong Xu
-
Lin Chen, et. al.Lin Chen ... Dong Xu
01 Jun 2013
01 Jun 2013

Best papers in multimedia information retrieval
Michael S Lew
International Journal of Multimedia Information Retrieval | VOL. 2
Michael S LewMichael S Lew
20 Feb 2013
International Journal of Multimedia Information Retrieval | VOL. 2

ERA: A Data Set and Deep Learning Benchmark for Event Recognition in Aerial Videos [Software and Data Sets
Lichao Mou ... Xiao Xiang Zhu
IEEE Geoscience and Remote Sensing Magazine | VOL. 8
Lichao Mou, et. al.Lichao Mou ... Xiao Xiang Zhu
01 Dec 2020
IEEE Geoscience and Remote Sensing Magazine | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

Abstract

Talk to us

Similar Papers

More From: IEEE Access