Machine learning-based football match prediction system

Tianyou Wang,Zheng Zhang,Shengxin Zhu

doi:10.54254/2755-2721/92/20241749

Abstract

This study develops a machine learning-based system to predict English Premier League (EPL) outcomes, employing models such as Principal Component Analysis (PCA), K-Nearest Neighbors (KNN), Random Forests, and Support Vector Machines (SVM). The analysis covered a large dataset of matches, with the data normalized to ensure consistency and accuracy across models. Among the methods used, Random Forests showed the most robust performance in predicting match outcomes, particularly in forecasting wins and losses. However, both Random Forests and SVM encountered difficulties in accurately predicting draws, which points to areas where further refinement is needed. The prediction probabilities largely fell within a specific range, indicating the models' ability to identify patterns, but significant overfitting was observed in the models. This overfitting suggests that while the models performed well on the training data, they struggled to generalize to new, unseen data, highlighting the importance of implementing more effective regularization techniques to prevent overfitting and improve the models' overall predictive accuracy in real-world scenarios.

Full Text