Abstract

The South African Education Management Information Systems (EMIS) hosts longitudinal data on school inventory, learners, and educators. One of the most prevailing and yet ignored phases in machine learning is Feature Selection (FS). Neglecting this phase can adversely impact the outcome of the machine-learning exercise. This study seeks to explore informative features from the EMIS system which can predict the possibility of learners prematurely transitioning to alternative learning spaces in the Limpopo education system. The Ravenstein migration theory was used to assemble the initial features which were then subjected to Boruta, RPART, Adaboost.M1, and J48 algorithms. The feature subsets generated by the FS algorithms were compared with filter-based statistical methods such as Spearman Correlation and Mutual Information to aid in the final selection of the best feature subset for the study. All machine learning FS methods performed well. Feature subset generated by Boruta was considered optimal due to relatively low importance score variance among the selected features compared to RPART, J48, and Adaboost.M1. It is believed that the low variance in the feature set will improve the model's stability and its ability to generalize with previously unseen data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call