Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition

Zexing Du,Qing Wang,Xue Wang

doi:10.1109/tcsvt.2023.3249906

Abstract

This paper focuses on exploring distinctive spatio-temporal representation in a self-supervised manner for group activity recognition. Firstly, previous networks treat spatial- and temporal-aware information as a whole, limiting their abilities to represent complex spatio-temporal correlations for group activity. Here, we propose the Spatial and Temporal Attention Heads (STAHs) to extract spatial- and temporal-aware representations independently, which generate complementary contexts for boosting group activity understanding. Then, we propose the Global Spatio-Temporal Contrastive (GSTCo) loss to aggregate these two kinds of features. Unlike previous works focusing on the individual temporal consistency while overlooking the correlations between actors, i.e., in a local perspective, we explore the global spatial and temporal dependency. Moreover, GSTCo could effectively avoid the trivial solution faced in contrastive learning by achieving the right balance between spatial and temporal representations. Furthermore, our method imports affordable overhead during pre-training, without additional parameters or computational costs in inference, guaranteeing efficiency. By evaluating on widely-used datasets for group activity recognition, our method achieves good performance. State-of-the-art performance is achieved when applying our pre-trained backbone to existing networks. Extensive experiments verify the generalizability of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Sep 1, 2023
Citations: 5

Similar Papers

Spatial Cues Influence Time Estimations in Deaf Individuals.
Maria Bianca Amadeo ... Monica Gori
iScience | VOL. 19
Maria Bianca Amadeo, et. al.Maria Bianca Amadeo ... Monica Gori
31 Jul 2019
iScience | VOL. 19

Interrelations Between Temporal and Spatial Cognition: The Role of Modality-Specific Processing.
Jonna Loeffler ... Rouwen Cañal-Bruland
Frontiers in Psychology | VOL. 9
Jonna Loeffler, et. al.Jonna Loeffler ... Rouwen Cañal-Bruland
21 Dec 2018
Frontiers in Psychology | VOL. 9

Spatio-temporal attention based collaborative local–global learning for traffic flow prediction
Haiyang Chi ... Bidong Chen
Engineering Applications of Artificial Intelligence | VOL. 139
Haiyang Chi, et. al.Haiyang Chi ... Bidong Chen
11 Nov 2024
Engineering Applications of Artificial Intelligence | VOL. 139

3-D dataset for Human Activity Recognition in video surveillance
M.M Sardsehmukh ... P.N Chatur
-
M.M Sardsehmukh, et. al.M.M Sardsehmukh ... P.N Chatur
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology