Abstract

How to rank participants of a sports tournament is of fundamental importance. While PageRank has been extensively used for this task, the algorithm’s superiority over simpler ranking methods has never been clearly demonstrated. We address this knowledge gap by comparing the performance of multiple ranking methods on synthetic datasets where the true ranking is known and the methods’ performance can be thus quantified by standard information filtering metrics. Using sports results from 18 major leagues, we calibrate a state-of-art model, a variation of the classical Bradley–Terry model, for synthetic sports results. We identify the relevant range of parameters under which the model reproduces statistical patterns found in the analyzed empirical datasets. Our evaluation of ranking methods on the synthetic datasets shows that PageRank outperforms the benchmark ranking by the number of wins only early in a tournament when a small fraction of all games have been played yet. Increased randomness in the data due to home team advantage, for example, further reduces the range of PageRank’s superiority. We propose a new PageRank variant that combines forward and backward propagation on the directed network representing the input sports results. The new method outperforms PageRank in all evaluated settings and, when the fraction of games played is sufficiently small and the sport is not too random, it outperforms also the ranking by the number of wins. Beyond the presented comparison of ranking methods, our work paves the way for designing optimal ranking algorithms for sports results data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call