This research primary aimed at evaluating various predictive models in predicting programming students at risk of dropping out. It also aimed at identifying attributes that are significant in predicting students at risk of dropping. The educational data mining process (EDM) was utilized as the research framework. The study conducted a ten-fold cross-validation, revealing that the k-nearest neighbors (kNN) algorithm achieved the highest classification accuracy at 95.5%. The decision tree model followed closely with a 94.9% accuracy, logistic regression exhibited 94.4%, and the neural network model yielded a classification accuracy of 93.2%. Further analysis, including confusion matrices and receiver operating characteristic (ROC) curves, provided detailed insights into the models' performance. Notably, the decision tree algorithm excelled in identifying students who did not drop out, with a misclassification rate of 9 out of 30 for dropped students. Analysis also showed that students’ assignments completed (AC), laboratory work (LW), and attendance (ATT) were the strongest predictors in identifying students at risk of dropping. Results of the study can be used by instructors to identify in advance student at risk of dropping and provide them with the necessary intervention to improve performance in programming.
Read full abstract