Abstract

We compare various extensions of the Bradley–Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley–Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley–Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance.

Highlights

  • The current paper stems from our participation in the 2017 Machine Learning Journal (Springer) challenge on predicting outcomes of soccer matches from a range of leagues around the world (MLS challenge, in short)

  • The suffix (t) indicates features with coefficients varying with matches played The model indicated by † is the one we used to compute the probabilities for the submission to the MLS challenge The acronyms are as follows: BL, Baseline; CS, Bradley–Terry with constant strengths; LF, Bradley–Terry with linear features; TVC, Bradley–Terry with time-varying coefficients; AFD, Bradley–Terry with additive feature differences and time interactions; hierarchical Poisson log-linear model (HPL), Hierarchical Poisson log-linear model

  • The sets of features that were used in the LF, TVC, AFD and HPL specifications in Table 4 resulted from ad-hoc experimentation with different combinations of features in the LF specification

Read more

Summary

Introduction

The current paper stems from our participation in the 2017 Machine Learning Journal (Springer) challenge on predicting outcomes of soccer matches from a range of leagues around the world (MLS challenge, in short). The performance of the various modeling approaches in predicting the outcomes of matches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the prediction error. The direct modeling of the outcomes using the various Bradley–Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model deliver similar performance in terms of predicting the outcome.

Data exploration
Feature extraction
Bradley–Terry models and extensions
33–34 August–May
BL: baseline
CS: constant strengths
LF: linear with features
TVC: time-varying coefficients
AFD: additive feature differences with time interactions
Handling draws
Identifiability
Other data-specific considerations
Model structure
Estimation
MLS challenge
Ranked probability score
Classification accuracy
Meta-analysis
Implementation
Results
Conclusions and discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.