Abstract

One of the main issues in higher education is student retention. Predicting students' performance is an important task for higher education institutions in reducing students' dropout rate and increasing students' success. Educational Data mining is an emerging field that focuses on dealing with data related to educational settings. It includes reading the data, extracting the information and acquiring hidden knowledge. This research used data from one of the Gulf Cooperation Council (GCC) universities, as a case study of Higher Education in the Middle East. The concerned University has an enrolment of about 20,000 students of many different nationalities. The primary goal of this research is to investigate the ability of building predictive models to predict students' academic performance and identify the main factors that influence their performance and grade point average. The development of a generalized model (a model that could be applied on any institution that adopt the same grading system either on the Foundation level (that use binary response variable (Pass/ Fail) or count response variable which is the Grade Average Point for students enrol in the undergraduate academic programs) to identify students in jeopardy of dismissal will help to reduce the dropout rate by early identification of needed academic advising, and ultimately improve students' success. This research showed that data science algorithms could play a significant role in predicting students' Grade Point Average by adopting different regression algorithms. Different algorithms were carried out to investigate the ability of building predictive models to predict students' Grade Point Average after either 2, 4 or 6 terms. These methods are Linear/ Logistic Regression, Regression Trees and Random Forest. These predictive models are used to predict specific students' Grade Point Average based on other values in the dataset. In this type of model, explicit instruction is given about what the model needs to learn. An optimization function (the model) is formed to find the target output based on specific input values. This research opens the door for future comprehensive studies that apply a data science approach to higher-education systems and identifying the main factors that influence student performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call