Abstract
The study aims to compare the performance of various machine learning models for student persistence prediction. The research starts with a historical review of student retention studies and the evolution of predictive models in the field. It highlights the importance of predicting student persistence for educational institutions and individuals. It then describes a dataset from ResearchGate, consisting of anonymized undergraduate student data collected between 2008 and 2018, with 37 features and 4,424 records. Ten machine learning algorithms are considered, with two popular machine learning algorithms, Logistic Regression, and Random Forest classification, being compared in more detail for their performance in predicting student persistence. Evaluation metrics such as prediction accuracy, precision, recall, and F1-score are used. Results show that the Random Forest model outperforms Logistic Regression in predicting student outcomes, particularly when using the synthetic minority oversampling technique (SMOTE) to address the class imbalance. Overall, this study contributes to student retention research and provides insights for developing targeted support measures to enhance student success in higher education.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.