SGBBA: An Efficient Method for Prediction System in Machine Learning using Imbalance Dataset

Saiful Islam,Anichur Rahman,A.N.M Rezaul,Umme Sara,Diganta Das,Mahedi Hasan,Abu Kawsar,Dipanjali Kundu

doi:10.14569/ijacsa.2021.0120351

Saiful Islam, Anichur Rahman + Show 6 more

Open Access

https://doi.org/10.14569/ijacsa.2021.0120351

Copy DOI

Abstract

A real world big dataset with disproportionate classification is called imbalance dataset which badly impacts the predictive result of machine learning classification algorithms. Most of the datasets faces the class imbalance problem in machine learning. Most of the algorithms in machine learning work perfectly with about equal samples counts for every class. A variety of solutions have been suggested in the past time by the different researchers and applied to deal with the imbalance dataset. The performance of these methods is lower than the satisfactory level. It is very difficult to design an efficient method using machine learning algorithms without making the imbalance dataset to balance dataset. In this paper we have designed an method named SGBBA: an efficient method for prediction system in machine learning using Imbalance dataset. The method that is addressed in this paper increases the performance to the maximum in terms of accuracy and confusion matrix. The proposed method is consisted of two modules such as designing the method and method based prediction. The experiments with two benchmark datasets and one highly imbalanced credit card datasets are performed and the performances are compared with the performance of SMOTE resampling method. F-score, specificity, precision and recall are used as the evaluation matrices to test the performance of the proposed method in terms of any kind of imbalance dataset. According to the comparison of the result of the proposed method computationally attains the effective and robust performance than the existing methods.

Highlights

Now-a-days imbalanced classification from the two-class imbalance dataset pose a severe problem of data science and machine learning where every class has supremacy over another class
Data preprocessing is performed by resampling the imbalance dataset such as oversampling or super sampling the class with minorities [3], undersampling of class with majority [4] and combining the oversampling and undersampling through bagging [8] and boosting [7] methods such as SMOTEBoost [9], RUSBoost [10], Overbagging [11], Underbagging [12]
In order to measure the effectiveness and efficiency, we take into account the accuracy, specificity, precision, recall, f-score to test our proposed efficient prediction methodology that are defined as follows a) Accuracy: The accuracy rate is normally the most common empirical measure in the classification algorithms for machine learning

Summary

Introduction

Now-a-days imbalanced classification from the two-class imbalance dataset pose a severe problem of data science and machine learning where every class has supremacy over another class. Data preprocessing is performed by resampling the imbalance dataset such as oversampling or super sampling the class with minorities [3], undersampling of class with majority [4] and combining the oversampling and undersampling through bagging [8] and boosting [7] methods such as SMOTEBoost [9], RUSBoost [10], Overbagging [11], Underbagging [12]. Both oversampling and undersampling methods creates various limitations in the dataset that make the prediction result and performance unreliable. Random undersampling refers to the elimination of samples from the majority class which has an enormous number of samples than the minority class until the number of majority class samples equals the number of minority class samples

Objectives

Methods

Results

Conclusion