Abstract

Imbalanced data are defined as dataset condition with some class is larger than any class in number: the larger class described as the majority or negative class and the less class described as minority or positive class. This condition considerate as problem in data classification since most of classifiers tend to predict major class and ignore minor class, hence the analysis provide lack of accuracy for minor class. Some basic ideas on the approach to the data level by using sampling-based approaches to handle this classification issue are under sampling and oversampling. Synthetic minority oversampling technique (SMOTE) is one of oversampling methods to increase number of positive class using sample drawing techniques by randomly replicate the data in such way that the number of positive class is equal to the number of negative class. Other method is Tomek links, an under sampling method and works by decreasing the number of negative class. In this research, combine sampling was done by combining SMOTE and Tomek links techniques along with SVM as the binary classification method. Based on accuracy rates in this study, using combine sampling method provided better result than SMOTE and Tomek links in 5-fold cross validation. However, in some extreme cases combine sampling method are no better than the use of methods Tomek links.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call