Recent years have seen a significant increase in study interest in the areas of predicting student performance, avoiding failure, and identifying the variables affecting student dropout. One important indicator in online and open distance learning courses is the student dropout rate. We purpose the naive bayes classification method to construct the student dropout prediction using naive bayes. This work examines the critical topic of forecasting student dropout rates in higher education using machine learning approaches, with a particular emphasis on the random forest algorithm and the naive bayes algorithm. The study's goal is to properly anticipate dropout rates using data mining methods and machine learning algorithms after conducting a thorough evaluation of existing literature and approaches. The systematic method consists of data collection from a Kaggle dataset, data preparation to solve class imbalance via SMOTE oversampling, and algorithm selection. Random forest and naive Bayes approaches outperform other machine learning algorithms in terms of accuracy, sensitivity, specificity, and precision. The study underscores the importance of considering diverse factors such as demographic data, socioeconomic factors, and academic performance in dropout prediction models. The implications of this research extend beyond academia, with the potential to inform proactive interventions and support systems, ultimately leading to improved student outcomes and institutional effectiveness. According to this paper, the paper outputs that for the binary classification on the data set used in this project has best performed with Naive Bayes and Random Forest Algorithm with SMOTE oversampling. Keywords- SMOTE oversampling, machine learning, Random forest, naive bayes.
Read full abstract