A Comparative Study and Impact Analysis of Different Oversampling Techniques for CIP

Dibya Jyoti Bora,Ankit Mohari,Avinash Navlani

doi:10.1149/10701.4261ecst

Abstract

In the area of data science, machine learning based classification techniques are the first choice for an accurate analysis of a huge amount of data. The first requirement while developing such classification techniques is the robustness that it will imply for an accurate and efficient classification. However, it is seen that in many times these algorithms suffer from “class-imbalance problem”, shortly CIP. Due to CIP, many difficulties arise during the learning process, which as a whole results a poor classification process. Resampling the data set is one common technique for dealing with CIP where in general oversampling the size of the rare class is made. We have different oversampling techniques available in the literature like SMOTE, ADYSN, and Random Oversample are the noted ones. In this paper, an effort is made to compare these different techniques as well as their impact on classification performance.

Full Text