Group Activity Recognition Research Articles

Group activity recognition (GAR) is an increasingly popular topic in the field of computer vision. Numerous researchers have proposed a range of methods to achieve outstanding recognition performance. However, these methods invariably require fine-grained personal feature extraction and a large network architecture to aggregate individual features or reason person relationships. To mitigate the need for a bloated portfolio of annotations and high training costs, weak supervision has emerged as a promising approach. Under the weak supervision paradigm, only coarse-grained labels are used during network training. Nevertheless, this method poses two key challenges. Firstly, it is limited in its ability to model temporal relationships among individual persons, and secondly, it tends to focus on less relevant information, thereby leading to suboptimal network parameter optimization. Both of these challenges result in erroneous temporal information judgment and training inefficiencies. To address these challenges within the weak supervision paradigm, we propose a novel Temporal Contrastive and Spatial Enhancement Coarse-Grained Network (TCSE-CGN) to solve the GAR problem. TCSE-CGN comprises two simple yet effective streams, namely the Spatial Enhancement Stream and the Temporal Contrastive Stream. After extracting features using only several RGB frames, half of the extracted feature is sent to the Spatial Enhancement Stream for enhancement using an attention mechanism. Consequently, the network automatically learns more representative information. The remaining feature is sent to the Temporal Contrastive Stream, which uses contrastive learning to model temporal relationships among all RGB frame-level features. Specifically, the network is guided to learn the hidden semantic temporal information about inter-frame sequences. Network parameters are optimized using a combination of universe cross-entropy loss and a novel temporal contrastive loss. Comprehensive experiments are conducted on two widely used datasets, namely the Volleyball dataset and the Collective dataset, to demonstrate the effectiveness of TCSE-CGN. Results show that TCSE-CGN performs competitively with other works that require more supervision and a larger architecture.

Read full abstract

Only a few key fish individuals can play a dominant role in actual fish group, therefore, it is reasonable to infer group activities from the relationship between individual actions. However, the complex underwater environment, rapid and similar fish individual movements are likely to cause the indistinct action characteristics, as well as adhesion of data distribution, and it is difficult to infer the relationship between individual actions directly by using graph convolutional network (GCN). Therefore, this paper proposes a graph convolution vector calibration (GCVC) network for fish group activity recognition through individual action relationship reasoning. By improving reasoning ability of GCN, an activity feature vector calibration module is designed to solve the data adhesion and mismatch between the estimated and true distribution. The idea is to first count the distribution of the original data, and make each dimension of its active feature vector follow the Gaussian distribution, so as to generate a better similar category distribution. In addition, we also produced a fish activity dataset to verify the performance of the proposed algorithm. The experimental results show that the GCVC achieves a group activity recognition accuracy of 93.33 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> , and the Macro-F1 is 93.25 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> , which is 19.21 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 24.2 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> higher than before, respectively. By using GCVC, the corrected activity feature vector distribution is more consistent, and the data adhesion is reduced, the model can achieve more fully supervised learning. The fish group activity dataset is available on Github: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/crazysboy/GCVC/tree/master</uri> .

Read full abstract

Group Activity Recognition Research Articles

Related Topics

Articles published on Group Activity Recognition

Contextual motion-aware for group activity recognition

Group activity recognition using unreliable tracked pose

Multi-dimensional convolution transformer for group activity recognition

Coarse-Fine Nested Network for Weakly Supervised Group Activity Recognition.

MLP-AIR: An effective MLP-based module for actor interaction relation learning in group activity recognition

Exploring global context and position-aware representation for group activity recognition

React: recognize every action everywhere all at once

Spatiotemporal information complementary modeling and group relationship reasoning for group activity recognition

FIFAWC: a dataset with detailed annotation and rich semantics for group activity recognition

Rethinking group activity recognition under the open set condition

Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition

Unveiling group activity recognition: Leveraging Local–Global Context-Aware Graph Reasoning for enhanced actor–scene interactions

Temporal Contrastive and Spatial Enhancement Coarse Grained Network for Weakly Supervised Group Activity Recognition

Dynamical Attention Hypergraph Convolutional Network for Group Activity Recognition.

GCVC: Graph Convolution Vector Distribution Calibration for Fish Group Activity Recognition

Active Factor Graph Network for Group Activity Recognition.

Masked Autoencoders for Spatial–Temporal Relationship in Video-Based Group Activity Recognition

Learning Label Semantics for Weakly Supervised Group Activity Recognition

Corrections to “Attention Relational Network for Skeleton-Based Group Activity Recognition”

Multi-level neural prompt for zero-shot weakly supervised group activity recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Group Activity Recognition Research Articles

Related Topics

Articles published on Group Activity Recognition

Contextual motion-aware for group activity recognition

Group activity recognition using unreliable tracked pose

Multi-dimensional convolution transformer for group activity recognition

Coarse-Fine Nested Network for Weakly Supervised Group Activity Recognition.

MLP-AIR: An effective MLP-based module for actor interaction relation learning in group activity recognition

Exploring global context and position-aware representation for group activity recognition

React: recognize every action everywhere all at once

Spatiotemporal information complementary modeling and group relationship reasoning for group activity recognition

FIFAWC: a dataset with detailed annotation and rich semantics for group activity recognition

Rethinking group activity recognition under the open set condition

Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition

Unveiling group activity recognition: Leveraging Local–Global Context-Aware Graph Reasoning for enhanced actor–scene interactions

Temporal Contrastive and Spatial Enhancement Coarse Grained Network for Weakly Supervised Group Activity Recognition

Dynamical Attention Hypergraph Convolutional Network for Group Activity Recognition.

GCVC: Graph Convolution Vector Distribution Calibration for Fish Group Activity Recognition

Active Factor Graph Network for Group Activity Recognition.

Masked Autoencoders for Spatial–Temporal Relationship in Video-Based Group Activity Recognition

Learning Label Semantics for Weakly Supervised Group Activity Recognition

Corrections to “Attention Relational Network for Skeleton-Based Group Activity Recognition”

Multi-level neural prompt for zero-shot weakly supervised group activity recognition