Abstract
Professional road cycling is a very competitive sport, and many factors influence the outcome of the race. These factors can be internal (e.g., psychological preparedness, physiological profile of the rider, and the preparedness or fitness of the rider) or external (e.g., the weather or strategy of the team) to the rider, or even completely unpredictable (e.g., crashes or mechanical failure). This variety makes perfectly predicting the outcome of a certain race an impossible task and the sport even more interesting. Nonetheless, before each race, journalists, ex-pro cyclists, websites and cycling fans try to predict the possible top 3, 5, or 10 riders. In this article, we use easily accessible data on road cycling from the past 20 years and the Machine Learning technique Learn-to-Rank (LtR) to predict the top 10 contenders for 1-day road cycling races. We accomplish this by mapping a relevancy weight to the finishing place in the first 10 positions. We assess the performance of this approach on 2018, 2019, and 2021 editions of six spring classic 1-day races. In the end, we compare the output of the framework with a mass fan prediction on the Normalized Discounted Cumulative Gain (NDCG) metric and the number of correct top 10 guesses. We found that our model, on average, has slightly higher performance on both metrics than the mass fan prediction. We also analyze which variables of our model have the most influence on the prediction of each race. This approach can give interesting insights to fans before a race but can also be helpful to sports coaches to predict how a rider might perform compared to other riders outside of the team.
Highlights
In recent years, the amount of data collected in sports has increased enormously
While in traditional Machine Learning (ML) approaches the goal is to predict an unknown value from past target outputs, may it be a classification or regression, the goal in Learn-to-Rank is to predict a permutation of a set of items having the most relevant items on the top of the list (Li, 2011)
If each past edition of a race is grouped as a subset and we consider each rider as a document and weight a mapping to the actual result in that edition, it is possible to apply the Learn-to-Rank approach to predict a ranked top 10 riders
Summary
The amount of data collected in sports has increased enormously. On the one hand, the usage of sensors on the body (e.g., heart rate monitors) and equipment (e.g., power meter on bicycles) allows detailed profiling of the athlete. A Learn-to-Rank Approach for Road Cycling Outcomes Prediction ventilatory thresholds from the cardiopulmonary exercise test (Zignoli et al, 2021), and assess the risk of injury (Claudino et al, 2019). Another popular application in sports data science is the prediction of sport events outcomes, of which some examples will be presented . If each past edition of a race is grouped as a subset and we consider each rider as a document and weight a mapping to the actual result in that edition, it is possible to apply the Learn-to-Rank approach to predict a ranked top 10 riders
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have