Abstract

In this this work, a study is presented with quantitative variables using machine learning tools to detect undergraduate students at risk of dropping out and the factors associated with this behavior. Clustering algorithms and classification methods were tested to determine the predictive power of several variables regarding the dropout phenomenon on an unbalanced database of 14,495 undergraduate students with a real dropout rate of 8.5% and a retention rate of 91.5%. The usual classification criterion that assigns individuals to a class if their probability of belonging to it is greater than 50% provided accuracies of 13.2% in the dropout prediction and 99.4% in the retention prediction. Among eight classifiers, Random Forest was selected and applied along with Threshold Probability, which allowed us to gradually increase the dropout precision to more than 50%, while maintaining retention and global precisions above 70%. Through this study, it was found that the main variables associated with student dropouts were their academic performance during the early weeks of the first semester, their average grade in the previous academic levels, the previous mathematics score, and the entrance exam score. Other important variables were the number of class hours being taken, student age, funding status of scholarships, English level, and the number of dropped subjects in the early weeks. Given the trade-off between dropout and retention precisions, our results can guide educational institutions to focus on the most appropriate academic support strategies to help students at real risk of dropping out.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call