Abstract

Group activity recognition is a central theme in many domains, such as sports video analysis, CCTV surveillance, sports tactics, and social scenario understanding. However, there are still challenges in embedding actors’ relations in a multi-person scenario due to occlusion, movement, and light. Current studies mainly focus on collective and individual local features from the spatial and temporal perspectives, which results in inefficiency, low robustness, and low portability. To this end, a Spatio-Temporal Attention-Based Graph Convolution Network (STAB-GCN) model is proposed to effectively embed deep complex relations between actors. Specifically, we leverage the attention mechanism to attentively explore spatio-temporal latent relations between actors. This approach captures spatio-temporal contextual information and improves individual and group embedding. Then, we feed actor relation graphs built from group activity videos into our proposed STAB-GCN for further inference, which selectively attends to the relevant features while ignoring those irrelevant to the relation extraction task. We perform experiments on three available group activity datasets, acquiring better performance than state-of-the-art methods. The results verify the validity of our proposed model and highlight the obstructive impacts of spatio-temporal attention-based graph embedding on group activity recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.