Video action detection by learning graph-based spatio-temporal interactions

Matteo Tomei,Lorenzo Baraldi,Simone Calderara,Simone Bronzin,Rita Cucchiara

doi:10.1016/j.cviu.2021.103187

Matteo Tomei, Lorenzo Baraldi + Show 3 more

Open Access

https://doi.org/10.1016/j.cviu.2021.103187

Copy DOI

Abstract

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modeling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Vision and Image Understanding	Publication Date: Feb 27, 2021
Citations: 18	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Video action detection by learning graph-based spatio-temporal interactions

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Similar Papers

MSDR: Multi-Step Dependency Relation Networks for Spatial Temporal Forecasting
Dachuan Liu ... Peng Han
-
Dachuan Liu, et. al.Dachuan Liu ... Peng Han
14 Aug 2022
14 Aug 2022

Motion-Aware Dynamic Graph Neural Network for Video Compressive Sensing.
Ruiying Lu ... Bo Chen
IEEE transactions on pattern analysis and machine intelligence | VOL. 46
Ruiying Lu, et. al.Ruiying Lu ... Bo Chen
01 Dec 2024
IEEE transactions on pattern analysis and machine intelligence | VOL. 46

Spatio-temporal-based multi-level aggregation network for physical action recognition
Yuhang Wang
Computer Science and Information Systems | VOL. 21
Yuhang WangYuhang Wang
01 Jan 2024
Computer Science and Information Systems | VOL. 21

Weakly Supervised Temporal Action Detection With Temporal Dependency Learning
Bairong Li ... Yuesheng Zhu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32
Bairong Li, et. al.Bairong Li ... Yuesheng Zhu
01 Jul 2022
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video action detection by learning graph-based spatio-temporal interactions

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding