Abstract
In Portugal, the dropout rate of university courses is around 29%. Understanding the reasons behind such a high desertion rate can drastically improve the success of students and universities. This work applies existing data mining techniques to predict the academic dropout mainly using the academic grades. Four different machine learning techniques are presented and analyzed. The dataset consists of 331 students who were previously enrolled in the Computer Engineering degree at the Universidade de Trás-os-Montes e Alto Douro (UTAD). The study aims to detect students who may prematurely drop out using existing methods. The most relevant data features were identified using the Permutation Feature Importance technique. In the second phase, several methods to predict the dropouts were applied. Then, each machine learning technique’s results were displayed and compared to select the best approach to predict academic dropout. The methods used achieved good results, reaching an F1-Score of 81% in the final test set, concluding that students’ marks somehow incorporate their living conditions.
Highlights
According to statistics reported by Direção-Geral de Estatística da Educação e Ciência (DGEEC) [1] the dropout rate in Portuguese universities is around 29%, and 14% of the remaining students do not complete the course in the stipulated time
The Artificial Neural Networks (ANNs) model was implemented into a pipeline (Figure 7), which applies additional data pre-processing before it gets fitted into the model
0.90 fact, all the methods considered in this paper show promising results in predicting academic dropout, emphasizingANN
Summary
According to statistics reported by Direção-Geral de Estatística da Educação e Ciência (DGEEC) [1] the dropout rate in Portuguese universities is around 29%, and 14% of the remaining students do not complete the course in the stipulated time. In the case of university education, different traditional actions can be taken by educational institutions to reduce academic dropout rates These include personalized monitoring of students at risk, requiring an enormous designation of human resources and time, or restructuring the course syllabus. Queiroga et al [3] developed a solution using only students’ interactions with the virtual learning environment and its derivative features for early prediction of at-risk students in a Brazilian distance technical high school course. They use an elitist genetic algorithm (GA) for tuning the hyperparameters of machine learning algorithms. Other work, proposed by Mubarak et al [4] used a Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM), called CONV-LSTM, to automatically extract features from Massive Open Online
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.