Massive Open Online Courses (MOOCs) have gained a lot of popularity recently. Despite the large number of students enrolled in these courses, a large percentage drop out. Due to this, predicting student dropout has taken on fundamental importance in this area. Predicting dropout early allows course organizers and educators to intervene and provide targeted support to at-risk students. They can offer additional resources, personalized assistance, or interventions tailored to address specific challenges faced by students, increasing their chances of successful course completion.This study first pre-processes the dataset to create a thirty-day correlation matrix for each learner, enabling early dropout prediction by the end of the first week. Then, six new models have been proposed using ensemble classification techniques with Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). CNN is used for automatic feature extraction, while LSTM considers the time series aspect of the data to improve early prediction performance.As ensemble classifiers can reduce the variance of prediction errors, using ensemble classifiers in addition to neural networks can enhance accuracy and F1 score without overfitting. The application of these techniques results in more accurate week-by-week dropout prediction.The experimental results on the KDD Cup 2015 dataset (representing XuetangX, a MOOC platform in China with 39 courses, 79,186 students, and 120,542 registered students, with 8,157,277 records collected over 30 days) show that all Bagging models improve performance of their base models. In one of the proposed models (Bagging LSTM-LSTM), at the end of the fifth week, the accuracy reached 94%, and the average accuracy reached 91%. Also, precision and recall reached an average of 92%, and F1 score reached 98%, which shows a significant improvement compared to previous researches.
Read full abstract