This paper presents a synthetic dataset of labeled game situations in recordings of federated handball and basketball matches played in Galicia, Spain. The dataset consists of synthetic data generated from real video frames, including 308,805 labeled handball frames and 56,578 labeled basketball frames extracted from 2105 handball and 383 basketball 5-s video clips. Experts manually labeled the video clips based on the respective sports, while the individual frames were automatically labeled using computer vision and machine learning techniques. The dataset encompasses seven classes of game situations: left attack, left counterattack, left penalty, right attack, right counterattack, right penalty, and timeout. In basketball, the penalty class refers to the free throws attempted by players after they have been fouled by an opposing player. Each frame in the dataset is assigned to one of these classes, considering the game situation and specific context. Importantly, the dataset does not contain actual video frames; instead, it provides a synthetic, normalized representation of each frame in JSON format. This tabular data includes player, referee, and ball positions on a normalized field, player and referee velocities, and key regions on the court. Positions of players, referees, and the ball were automatically inferred in each frame by an object detector, followed by a tracking step to detect object positions across frames and compute the velocity vectors. Finally, the obtained coordinates underwent normalization through a perspective transformation, ensuring that the data remained unaffected by variations in camera configurations across different arenas and camera setups. We refer to this standardized coordinate space as the 'unified space'. The dataset holds significant potential for reuse in various domains related to sports analytics and machine learning research. It can serve as a valuable resource for researchers, coaches, and sports enthusiasts, contributing to improvements in player performance, game strategies, match retransmissions, and sports-related technologies.
Read full abstract