Abstract

Action recognition makes significant contributions to sports video analysis, especially for athletes’ training evaluations. For sports video analysis, the action information is mainly conveyed by human body parts’ temporal movement, and each of the parts has a unique importance to the action representation. Aiming to involve this point in action recognition, we propose a part-attention spatio-temporal graph convolutional network (PSGCN) to exploit the dynamic spatio-temporal information in a sports video; it learns the importance of different parts to emphasize the contribution on the task of action recognition. Specifically, PSGCN first divides the human body into six parts and extracts their convolutional neural network (CNN) features, as well as concatenating the global feature of the whole frame; it then utilizes a cross-part and cross-frame graph building module to formulate the graph correlation of the parts from different frames. Inspired by the larger temporal variation of the same part containing more action information, we further propose a part-attention (PA) learning module to estimate the importance of each part, which can strengthen the graph correlation to support a PA graph. Finally, PSGCN conducts a graph convolutional network on the learned PA spatio-temporal graph with the learned part CNN features, which can obtain the action representation for the given sports video. In addition, the whole network is optimized by two losses of PA and action classification. To perform the superiority of PSGCN, we carry out extensive experiments of our model compared with several state-of-the-art methods on widely used action recognition datasets, especially for sports action. The results reflect the advantages of the proposed PSGCN on sports video analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call