Abstract
Context: School dropout is a significant challenge for the Brazilian education system. Several factors need to be corrected, and others eliminated so that students can to have access to higher education and guarantee the completion of their courses. Motivation: finding the best model to predict a specific problem is not a simple task. It's because the phenomena involved are not known, or are sophisticated modeling. Thus, combining models often produces better accuracy than individual models. Different models use this combination approach and have been applied in the context of Data Mining (MD), for prediction and classification. Objective: we propose in this study three different models to predict school dropout. These are based on Ensemble Regression. We apply the models in the context of the Brazilian Higher Education Institutions. Besides, it may help in the identification of the factors associated with dropout. For this, we used two techniques for the attribute selection: Stepwise and Pearson correlation. That techniques determine the factors related to dropout. Methodology: we used the data from the Census and Flow Indicators Higher Education. The methodology is based on CRISP-DM to understand, prepare, and model the data. We used predictive bagging methods to make a model to predict dropout. Results: the ensemble regression models proposed obtained better performance compared model literature. The ensemble model based on bagging of linear regression had a smaller prediction error. Besides, the models proposed in this study will help the educational administrators and policymakers working within the educational sector in the development of new policies that are relevant to student retention. But, the global implications of this research to practice is its ability to help in early identifying factories associated with students at risk of dropout of High Education.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have