Player identification is an essential and complex task in sports video analysis. Different strategies have been devised over the years and identification based on jersey numbers is one of the most common approaches given its versatility and relative simplicity. However, automatic detection of jersey numbers is challenging due to changing camera angles, low video resolution, small object size in wide-range shots, and transient changes in the player's posture and movement. In this paper, we present a novel approach for jersey number identification in a small, highly imbalanced dataset from the Seattle Seahawks practice videos. We generate novel synthetic datasets of different complexities to mitigate the data imbalance and scarcity in the samples. To show the effectiveness of our synthetic data generation, we use a multi-step strategy that enforces attention to a particular region of interest (player's torso), to identify jersey numbers. The solution first identifies and crops players in a frame using a person detection model, then utilizes a human pose estimation model to localize jersey numbers in the detected players, obviating the need for annotating bounding boxes for number detection. We experimented with two sets of Convolutional Neural Networks (CNNs) with different learning objectives: multi-class for two-digit number identification and multi-label for digit-wise detection to compare performance. Our experiments indicate that our novel synthetic data generation method improves the accuracy of various CNN models by 9% overall, and 18% on low frequency numbers.
Read full abstract