Abstract

In the field of data mining applications, the most challenging research work is to classify imbalanced data in supervised learning. From the beginning of the machine learning era, a huge number of researches has done by inventing a lot of successful data balancing methods, however, these are not yet fully compatible to learn from imbalanced data. The key reason is data is producing in every moment and randomly by various sources, like humans, machines, sensors, robots, and so on. To date, the existing machine learning algorithms mostly biased to the majority class instances and neglecting the minority class instances, therefore they have a crucial effect on the prediction capacity and exactness on the final results. To deal with imbalanced data the most popular classifiers are under-sampling and over-sampling, cost-sensitive learning and ensemble learning. In this paper, we have presented a novel data balancing method by employing clustering technique with SVM especially decision boundary instances for creating balanced data. The proposed approach selects the most informative majority class and minority class support vectors nearest to the decision boundary or hyperplane, and finally, combines them to establishes a novel method. We have experimented with the performance of our proposed technique with some popular single classifiers such as C4.5 Decision Tree (DT) and Naive Bayes (NB) classifier and ensemble classifiers such as Random Forest and AdaBoost on thirteen standard real-life imbalanced datasets. The experimental result shows apparently, that the proposed idea generates improved accuracy when classifying both the minority and majority class instances compared to other existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call