Soccer is evolving into a science rather than just a sport, driven by intense competition between professional teams. This transformation requires efforts beyond physical training, including strategic planning, data analysis, and advanced metrics. Coaches and teams increasingly use sophisticated methods and data-driven insights to enhance decision-making. Analyzing team performance is crucial to prepare players and coaches, enabling targeted training and strategic adjustments. Expected goals (xG) analysis plays a key role in assessing team and individual player performance, providing nuanced insights into on-field actions and opportunities. This approach allows coaches to optimize tactics and lineup choices beyond traditional scorelines. However, relying solely on xG might not provide a full picture of player performance, as a higher xG does not always translate into more goals due to the intricacies and variabilities of in-game situations. This paper seeks to refine performance assessments by incorporating predictions for both expected goals (xG) and actual goals (aG). Using this new model, we consider a wider variety of factors to provide a more comprehensive evaluation of players and teams. Another major focus of our study is to present a method for selecting and categorizing players based on their predicted xG and aG performance. Additionally, this paper discusses expected goals and actual goals for each individual game; consequently, we use expected goals per game (xGg) and actual goals per game (aGg) to reflect them. Moreover, we employ regression machine learning models, particularly ridge regression, which demonstrates strong performance in forecasting xGg and aGg, outperforming other models in our comparative assessment. Ridge regression’s ability to handle overlapping and correlated variables makes it an ideal choice for our analysis. This approach improves prediction accuracy and provides actionable insights for coaches and analysts to optimize team performance. By using constructed features from various methods in the dataset, we improve our model’s performance by as much as 12%. These features offer a more detailed understanding of player performance in specific leagues and roles, improving the model’s accuracy from 83% to nearly 95%, as indicated by the R-squared metric. Furthermore, our research introduces a player selection methodology based on their predicted xG and aG, as determined by our proposed model. According to our model’s classification, we categorize top players into two groups: efficient scorers and consistent performers. These precise forecasts can guide strategic decisions, player selection, and training approaches, ultimately enhancing team performance and success.
Read full abstract