Abstract

Group activity recognition is a central theme in many domains, such as sports video analysis, CCTV surveillance, sports tactics, and social scenario understanding. However, there are still challenges in embedding actors’ relations in a multi-person scenario due to occlusion, movement, and light. Current studies mainly focus on collective and individual local features from the spatial and temporal perspectives, which results in inefficiency, low robustness, and low portability. To this end, a Spatio-Temporal Attention-Based Graph Convolution Network (STAB-GCN) model is proposed to effectively embed deep complex relations between actors. Specifically, we leverage the attention mechanism to attentively explore spatio-temporal latent relations between actors. This approach captures spatio-temporal contextual information and improves individual and group embedding. Then, we feed actor relation graphs built from group activity videos into our proposed STAB-GCN for further inference, which selectively attends to the relevant features while ignoring those irrelevant to the relation extraction task. We perform experiments on three available group activity datasets, acquiring better performance than state-of-the-art methods. The results verify the validity of our proposed model and highlight the obstructive impacts of spatio-temporal attention-based graph embedding on group activity recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call