Beyond Early Warning Indicators: High School Dropout and Machine Learning

Dario Sansone

doi:10.1111/obes.12277

Abstract

AbstractThis paper combines machine learning with economic theory in order to analyse high school dropout. It provides an algorithm to predict which students are going to drop out of high school by relying only on information from 9th grade. This analysis emphasizes that using a parsimonious early warning system – as implemented in many schools – leads to poor results. It shows that schools can obtain more precise predictions by exploiting the available high‐dimensional data jointly with machine learning tools such as Support Vector Machine, Boosted Regression and Post‐LASSO. Goodness‐of‐fit criteria are selected based on the context and the underlying theoretical framework: model parameters are calibrated by taking into account the policy goal – minimizing the expected dropout rate ‐ and the school budget constraint. Finally, this study verifies the existence of heterogeneity through unsupervised machine learning by dividing students at risk of dropping out into different clusters.

Full Text