Classification of Student Graduation using NaÃ¯ve Bayes by Comparing between Random Oversampling and Feature Selections of Information Gain and Forward Selection

Dony Fahrudy,Shofwatul 'Uyun

doi:10.30630/joiv.6.4.982

Abstract

Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithmsâ€™ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of NaÃ¯ve Bayes algorithm. This study indicated an increase in the average accuracy of the NaÃ¯ve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the NaÃ¯ve Bayes method.

Full Text