Distance education supports lifelong learning and empowers individuals in rapidly changing societal conditions, yet it encounters high dropout rates due to a range of individual and societal obstacles. This study addresses the challenge of creating a practical prediction model by analyzing extensive real-world time-point data from a well-established online university in Seoul. Covering 144,540 instances from 2018 to 2022, the study integrates diverse datasets to compare the accuracy of models based on longitudinal, semester-wise, and gender-specific datasets. The demographic, academic, and online metrics identified significant dropout indicators, including age (particularly when binned), residential area, specific occupations, GPA, and LMS log metrics, using a stepwise backward elimination process. The study revealed that, despite societal changes, recent data from the last four semesters can be effectively used for stable prediction training. Gender-based analysis showed different factors influencing dropout risk for males and females. The Light Gradient Boosting Machine (LGBM) algorithm excelled in prediction accuracy, with the ROC-AUC metric affirming its superiority. However, logistic regression also showed its competitive performance and offered in-depth interpretation. In South Korea's distinct educational setting, merging advanced algorithms like LGBM with the interpretive strength of logistic regression is key for effective student support strategies.
Read full abstract