Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem

Nir Ofek,Lior Rokach,Roni Stern,Asaf Shabtai

doi:10.1016/j.neucom.2017.03.011

Abstract

Datasets that have imbalanced class distributions pose a challenge for learning and classification algorithms. Imbalanced datasets exist in many domains, such as: fraud detection, sentiment analysis, churn prediction, and intrusion detection in computer networks. To solve the imbalance problem, three main approaches are typically used: data resampling, method adaptation and cost-sensitive learning; of these, data resampling, either oversampling the minority class instances or undersampling the majority class instances, is the most used approach. However, in most cases, when implementing these approaches, there is a trade-off between the predictive performance and the complexity. In this paper we introduce a fast, novel clustering-based undersampling technique for addressing binary-class imbalance problems, which demonstrates high predictive performance, while its time complexity is bound by the size of the minority class instances. During the training phase, the algorithm clusters the minority instances and selects a similar number of majority instances from each cluster. A specific classifier is then trained for each cluster. An unlabeled instance is classified as the majority class if it does not fit into any of the clusters. Otherwise, cluster-specific classifiers are used to return the instance's classification, and the results are weighted by the inverse-distance from the clusters. Our evaluation includes several state-of-the-art methods. We plot the Pareto frontier for various datasets, to consider both computational cost and predictive performance measures. Extensive sets of experiments demonstrate that only the suggested method is always found on the frontier.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Mar 9, 2017
Citations: 106

Similar Papers

Intrusion detection in computer networks using hybrid machine learning techniques
Deyban Perez ... Eugenio Scalise
-
Deyban Perez, et. al.Deyban Perez ... Eugenio Scalise
01 Sep 2017
01 Sep 2017

Mini-Issue on Anomaly Detection
David M Steinberg
Technometrics | VOL. 52
David M SteinbergDavid M Steinberg
01 Feb 2010
Technometrics | VOL. 52

A nature-inspired approach to speed up optimum-path forest clustering and its application to intrusion detection in computer networks
Kelton A.P Costa ... Alexandre Xavier Falcão
Information Sciences | VOL. 294
Kelton A.P Costa, et. al.Kelton A.P Costa ... Alexandre Xavier Falcão
05 Oct 2014
Information Sciences | VOL. 294

Time series contextual anomaly detection for detecting market manipulation in stock market
Koosha Golmohammadi ... Osmar R Zaiane
-
Koosha Golmohammadi, et. al.Koosha Golmohammadi ... Osmar R Zaiane
01 Oct 2015
01 Oct 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem

Abstract

Talk to us

Similar Papers

More From: Neurocomputing