Data-driven match analysis in soccer is a growing discipline in both research and practice. However, public data is scarce, which raises the barrier for entering this field and decreases reproducibility of methods and results. To bridge this gap, this paper presents a dataset of official match information, event, and position data from seven matches of the German Bundesliga’s first and second division. The match information contains meta data about the matches and their participants. The event data contain timestamps along with descriptions of discrete events, like passes, shots, or fouls. The position data contain the x/y-coordinates of every player and the ball. By integrating multiple data modalities – i.e., event logs with timestamps, and x-y coordinates of player and ball positions — the dataset offers a multidimensional view of match dynamics. This dataset supports the validation of existing analytical techniques and facilitates the development of new methodologies in sports analytics. With availability under CC-BY 4.0, it promotes transparency, reproducibility, and the idea of open science in match analysis research.
Read full abstract