Abstract

The purpose of this paper is to evaluate several machine learning models under the CRISP-DM methodology in order to determine, through its metrics, the best model for predicting the performance of high school students in the Colombian Caribbean region in the Saber 11º test, while proposing a new methodology for evaluating the results of the test by regions in order to take into account the socioeconomic particularities of each one of them. The CRISP-DM methodology is taken as a basis due to its maturity, this methodology allows the extraction of business and data knowledge, offers a guide for data preparation, modeling and validation of the models; it is expected that the proposed methodology will be implemented by the Colombian Institute for the Promotion of Higher Education (ICFES), departmental education secretariats and educational institutions. A variety of techniques and tools were used to develop ETL processes to obtain a data set with the most relevant attributes, in order to evaluate four machine learning models developed with the J48 (C4.5), LMT, PART and Multilayer Perceptron algorithms; obtaining that the best data set and the best learning model is obtained using the InfoGain attribute selection method and the LMT decision tree algorithm, respectively. Therefore, this project will facilitate the actors of the National Education System to make decisions for the benefit of students and the quality of education in the country, especially in the Caribbean region.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call