Abstract

Motion information has been widely exploited for group activity recognition in sports video. However, in order to model and extract the various motion information between the adjacent frames, existing algorithms only use the coarse video-level labels as supervision cues. This may lead to the ambiguity of extracted features and the omission of changing rules of motion patterns that are also important sports video recognition. In this paper, a latent label mining strategy for group activity recognition in basketball videos is proposed. The authors' novel strategy allows them to obtain the latent labels set for marking different frames in an unsupervised way, and build the frame-level and video-level representations with two separate levels of supervision signal. Firstly, the latent labels of motion patterns are digged using the unsupervised hierarchical clustering technique. The generated latent labels are then taken as the frame-level supervision signal to train a deep CNN for the frame-level features extraction. Lastly, the frame-level features are fed into an LSTM network to build the spatio-temporal representation for group activity recognition. Experimental results on the public NCAA dataset demonstrate that the proposed algorithm achieves state-of-the-art performance.

Highlights

  • Content-based sports video analysis has been attracting significant attentions from the field of computer vision, owing to its widespread applications in real world [1,2,3,4]

  • For (ii), the 3D CNN framework normally requires high computational complexity and is weak at modeling long term information variation, which may lead to the omission of significant motion features and the reduction of recognition accuracy. In view of these problems, this paper aims at mining the latent labels of motion patterns from the frames and further combining two levels of supervision signal to obtain the effective spatio-temporal representation

  • The results demonstrate that our method outperforms current state-of-the-art methods for group activity recognition, which indicates the effectiveness of the proposed method

Read more

Summary

Introduction

Content-based sports video analysis has been attracting significant attentions from the field of computer vision, owing to its widespread applications in real world [1,2,3,4]. Take the class of the three-point as an example, there are both different types of global and local motion patterns, the global camera motion includes panning or tilting firstly and zooming in on the basket lastly, the local motion includes one player in a certain region moving vigorously and others moving slightly. Both these different motion patterns and their typical changing rules can be very helpful to define the group activity

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.