Abstract

Malaysia citizens are categorized into three different income groups which are the Top 20 Percent (T20), Middle 40 Percent (M40), and Bottom 40 Percent (B40). One of the focus areas in the Eleventh Malaysia Plan (11MP) is to elevate the B40 household group towards the middle-income society. In 2018, it was estimated that 4.1 million households belong to this group. The government of Malaysia has widened access to higher education for the B40 group in an effort to reduce the gaps in socioeconomics and to improve their living standards. Statistical data shows that since 2013, a yearly intake of students in bachelor's degree programs in Malaysia's public universities amounts to more than 85,000. Despite this huge number of enrolments, not all were able to graduate, including students from low-income family background. Data mining approach with machine learning techniques has been widely used effectively and accurately to predict students at risk of dropping out in general education. However, machine learning related works on student attrition in Malaysia's higher education is generally lacking. Therefore, in this research, three machine learning models were developed using Decision Tree, Random Forest and Artificial Neural Network algorithm in order to classify attrition among B40 students in bachelor's degree programs in Malaysia's public universities. Comparative performance analysis between the three models indicates that the Random Forest model is the best model in predicting student attrition in this study. Random Forest model outperforms the other two models in terms of accuracy, precision, recall and F-measure with the value of 95.93%, 97.10%, 81.26% and 88.50%, respectively. Nevertheless, there is a statistically significant difference in performance between the Random Forest model and Decision Tree model but no statistically significant difference between Random Forest models and Artificial Neural Network model.

Highlights

  • Malaysia's household income is classified into three groups, which are Bottom 40% (B40), Middle 40% (M40) and Top 20% (T20)

  • The results indicated that Random Forest (RF) model gives the highest accuracy in predicting student drop-out with 95.93%, followed by Artificial Neural Network (ANN) with 95.86% and Decision Tree (DT) with 95.84%

  • RF yields a higher accuracy rate than the other two models, even by applying different numbers of attributes. This showed that prediction performance could be improved with the use of ensemble learning. This result is inline with research outcome by [20], which predicts students' drop-out in higher learning institution, revealing that the accuracy of the prediction model using RF l was higher than DT

Read more

Summary

INTRODUCTION

Malaysia's household income is classified into three groups, which are Bottom 40% (B40), Middle 40% (M40) and Top 20% (T20). As reported in Higher Education Statistic, from the year 2011 to 2015, the total number of students intake for bachelor's degree programmes in Malaysia Public Universities is more than 85,000 students yearly. The family financial burden will increase as student's education loan has to be paid even if they fail to graduate. It will affect a student's chances on securing a high-income job. The aim of this paper is to conduct a comparative study for machine learning models in predicting attrition among B40 students, in the bachelor's degree programme in Malaysia Public Universities.

LITERATURE REVIEW
Objective
Result
RESEARCH METHODOLOGY
Data Preparation
Descriptive Analysis
Modelling and Evaluation
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.