Abstract

The global expansion of the sports betting industry has brought the prediction of outcomes of sport events into the foreground of scientific research. In this work, soccer outcome prediction methods are evaluated, focusing on the Greek Super League. Data analysis, including data cleaning, Sequential Forward Selection (SFS), feature engineering methods and data augmentation is conducted. The most important features are used to train five machine learning models: k-Nearest Neighbor (k-NN), LogitBoost (LB), Support Vector Machine (SVM), Random Forest (RF) and CatBoost (CB). For comparative reasons, the best model is also tested on the English Premier League and the Dutch Eredivisie, exploiting data statistics from six seasons from 2014 to 2020. Convolutional neural networks (CNN) and transfer learning are also tested by encoding tabular data to images, using 10-fold cross-validation, after applying grid and randomized hyperparameter tuning: DenseNet201, InceptionV3, MobileNetV2 and ResNet101V2. This is the first time the Greek Super League is investigated in depth, providing important features and comparative performance between several machine and deep learning models, as well as between other leagues. Experimental results in all cases demonstrate that the most accurate prediction model is the CB, reporting 67.73% accuracy, while the Greek Super League is the most predictable league.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call