A Joint Framework for Athlete Tracking and Action Recognition in Sports Videos

Longteng Kong,Jie Qin,Di Huang,Yunhong Wang

doi:10.1109/tcsvt.2019.2893318

Abstract

Sports video analysis has received increasing attention in recent years. Athlete tracking and action recognition are its two major issues that are highly related to each other; however, they are individually considered and processed in the existing studies. In this paper, we propose a joint framework for athlete tracking and action recognition in sports videos. In athlete tracking, we propose a scaling and occlusion robust tracker, named scaling and occlusion robust compressive tracking (CT), to localize the position of specific athlete in each frame. It follows the approach of CT but extends it in two aspects, i.e., scale refinement as well as occlusion recovery. For the former, an objectness method, edge box, is adopted to generate proposals, which replace the fixed sampling boxes in CT and better fit the scales of the candidate objects. For the latter, a candidate obstruction-based solution is presented, which brings in additional trackers to detect possible obstructions and to relocate the target as occlusion ends. Regarding action recognition, we propose a long-term recurrent region-guided convolutional network, which recognizes pre-defined actions by modeling discriminative temporal cues of the tracking results. We employ SPP-net to extract the robust feature of the tracked region of each frame. The features of all the frames are then fed into a stack of recurrent sequence models to capture the long-term region-level information. We extensively evaluate the proposed approach on a newly collected sports video benchmark and on the off-the-shelf UIUC2 dataset, and the experimental results clearly show its effectiveness.

Full Text