Early prediction of student performance in online programming courses is essential for implementing timely interventions to enhance academic outcomes. This study aimed to predict academic success by comparing four machine learning models: Logistic Regression, Random Forest, Support Vector Machine (SVM), and Neural Network (Multilayer Perceptron, MLP). We analyzed data from the Moodle Learning Management System (LMS) and external factors of 591 students enrolled in online object-oriented programming courses at the Universidad Estatal de Milagro (UNEMI) between 2022 and 2023. The data were preprocessed to address class imbalance using the synthetic minority oversampling technique (SMOTE), and relevant features were selected based on Random Forest importance rankings. The models were trained and optimized using Grid Search with cross-validation. Logistic Regression achieved the highest Area Under the Receiver Operating Characteristic Curve (AUC-ROC) on the test set (0.9354), indicating strong generalization capability. SVM and Neural Network models performed adequately but were slightly outperformed by the simpler models. These findings suggest that integrating LMS data with external factors enhances early prediction of student success. Logistic Regression is a practical and interpretable tool for educational institutions to identify at-risk students, and to implement personalized interventions.
Read full abstract