Predicting Student Performance Using Feature Selection Algorithms for Deep Learning Models

Padilha Thereza P P,Gernel Lumacad,Richard Catrambone

doi:10.1109/laclo54177.2021.00009

Abstract

Feature selection is an integral process for feature engineering prior to deep learning (DL) model development. The idea is to reduce complexities of high - dimensional data structures by keeping only relevant information in the data mining process. The critical part in developing a DL model to predict student performance is the high - dimensionality of students’ profiles which results in a DL model with low performance metrics. Students’ profile/data involves different aspects such as demographic information, academic records, technological resources, social attitudes, family background and/or socio – economic status. Empirically, the diversity of these data produce complexity in terms of dimension. In this paper, we compared the effectiveness of four feature selection algorithms (Information Gain Based, ReliefF, Boruta and Recursive Feature Elimination) on deep learning models using an educational dataset from Portugal. The effectiveness is measured using the following model performance metrics: training accuracy, validation accuracy, testing accuracy, kappa statistic, and f-measure. Results revealed the robustness of the Boruta algorithm in dimensionality reduction as it allowed the deep learning model to achieve its highest performance metrics compared to the utilization of other feature selection algorithms.

Full Text