Abstract

With the wide spread of massive open online courses ( MOOC ), millions of people have enrolled in many courses, but the dropout rate of most courses is more than 90%. Accurately predicting the dropout rate of MOOC is of great significance to prevent learners’ dropout behavior and reduce the dropout rate of students. Using the PH278x curriculum data on the Harvard X platform in spring 2013, and based on the statistical analysis of the factors that may affect learners’ final completion of the curriculum from two aspects: learners’ own characteristics and learners’ learning behavior, we established the MOOC dropout rate prediction models based on logical regression, K nearest neighbor and random forest, respectively. Experiments with five evaluation metrics (accuracy, precision, recall, F1 and AUC) show that the prediction model based on random forest has the highest accuracy, precision, F1 and AUC, which are 91.726%, 93.0923%, 95.4145%, 0.925341, respectively, its performance is better than that of the prediction model based on logical regression and that of the model based on K-nearest neighbor, whose values of these metrics are 91.395%, 92.8674%, 95.2337%, 0.912316 and 91.726%, 93.0923%, 95.4145% and 0.925341, respectively. As for recall metrics, the value of random forest is higher than that of KNN, but slightly lower than that of logistic regression, which are 0.992476, 0.977239 and 0.978555, respectively. Then, we conclude that random forests perform best in predicting the dropout rate of MOOC learners. This study can help education staff to know the trend of learners’ dropout behavior in advance, so as to put some measures to reduce the dropout rate before it occurs, thus improving the completion rate of the curriculum.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call