Abstract

Class-imbalanced data with high attribute dimensions in datasets frequently contribute to issues in a classification process as this can affect algorithms’ performance in the computing process because there are imbalanced numbers of data in each class and irrelevant attributes that must be processed; therefore, this needs for some techniques to overcome the class-imbalanced data and feature selection to reduce data complexity and irrelevant features. Therefore, this study applied random oversampling (ROs) method to overcome the class-imbalanced data and two feature selections (information gain and forward selection) compared to determine which feature selection is superior, more effective and more appropriate to apply. The results of feature selection then were used to classify the student graduation by creating a classification model of Naïve Bayes algorithm. This study indicated an increase in the average accuracy of the Naïve Bayes method without the ROs preprocessing and the feature selection (81.83%), with the ROs (83.84%), with information gain with 3 selected features (86.03%) and forward selection with 2 selected features (86.42%); consequently, these led to increasing accuracy of 4.2% from no pre-processing to information gain and 4.59% from no pre-processing to forward selection. Therefore, the best feature selection was the forward selection with 2 selected features (GPA of the 8th semester and the overall GPA), and the ROs and both feature selections were proven to improve the performance of the Naïve Bayes method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call