Abstract
The study examines the use of machine learning models to forecast attendance at sports stadiums, specifically analyzing National Football League (NFL) games from 2000 to 2019, with over 5,055 regular-season games. The models, including Linear Regression, Classification and Regression Trees (CART), Random Forest, CatBoost, and XGBoost, integrate a diverse set of variables such as team performance, economic indicators, stadium characteristics, and weather conditions. Each model's accuracy and effectiveness are assessed using five statistical metrics. With a Mean Absolute Error (MAE) of 0.02 and a Root Mean Squared Error (RMSE) of 0.04, the models display high precision in predicting stadium attendance. The coefficient of determination (R²) reaches 77.27% after optimization. These figures suggest that the models, particularly Random Forest and CatBoost, are highly effective in forecasting attendance rates for NFL games. Key influences on game attendance include factors like 'stadium_name,' 'personal_income,' 'stadium_age,' and 'home_club_age', which emerge as significant predictors. This study fills a theoretical gap in the limited research on the NFL and provides valuable insights for strategic planning and decision-making in professional sports management.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have