A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection

Jing Luo,Hongxiao Fei,Li Chen,Yulin Yang,Rongkai Liu,You Zou,Ronghua Shi,Chao Hu

doi:10.3390/electronics13030479

Abstract

Spatio-temporal action detection (STAD) is a task receiving widespread attention and has numerous application scenarios, such as video surveillance and smart education. Current studies follow a localization-based two-stage detection paradigm, which exploits a person detector for action localization and a feature processing model with a classifier for action classification. However, many issues occur due to the imbalance between task settings and model complexity in STAD. Firstly, the model complexity of heavy offline person detectors adds to the inference overhead. Secondly, the frame-level actor proposals are incompatible with the video-level feature aggregation and Region-of-Interest feature pooling in action classification, which limits the detection performance under diverse action motions and results in low detection accuracy. In this paper, we propose a tracking-based two-stage spatio-temporal action detection framework called TrAD. The key idea of TrAD is to build video-level consistency and reduce model complexity in our STAD framework by generating action track proposals among multiple video frames instead of actor proposals in a single frame. In particular, we utilize tailored tracking to simulate the behavior of human cognitive actions and used the captured motion trajectories as video-level proposals. We then integrate a proposal scaling method and a feature aggregation module into action classification to enhance feature pooling for detected tracks. Evaluations in the AVA dataset demonstrate that TrAD achieves SOTA performance with 29.7 mAP, while also facilitating a 58% reduction in overall computation compared to SlowFast.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Jan 23, 2024
License type: CC BY 4.0

Similar Papers

CFENet: Content-aware feature enhancement network for multi-person pose estimation
Xixia Xu ... Qi Zou
Applied Intelligence | VOL. 52
Xixia Xu, et. al.Xixia Xu ... Qi Zou
26 Apr 2021
Applied Intelligence | VOL. 52

STD-TR: End-to-End Spatio-Temporal Action Detection with Transformers
Zexian Li ... Peng Shi
-
Zexian Li, et. al.Zexian Li ... Peng Shi
22 Oct 2021
22 Oct 2021

An Extensive Analysis of the Vision-based Deep Learning Techniques for Action Recognition
Manasa R ... Saranya Kc
International Journal of Advanced Computer Science and Applications | VOL. 12
Manasa R, et. al.Manasa R ... Saranya Kc
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

High-resolution remote sensing image semantic segmentation based on a deep feature aggregation network
Zhen Wang ... Wenzhun Huang
Measurement Science and Technology | VOL. 32
Zhen Wang, et. al.Zhen Wang ... Wenzhun Huang
26 May 2021
Measurement Science and Technology | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection

Abstract

Talk to us

Similar Papers

More From: Electronics