A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

Guillem Collell,Kaustubh R Patil,Drazen Prelec

doi:10.1016/j.neucom.2017.08.035

Guillem Collell, Kaustubh R Patil + Show 1 more

Open Access

https://doi.org/10.1016/j.neucom.2017.08.035

Copy DOI

Abstract

Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori, i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.

Highlights

Dealing with a class imbalance in classification is an important problem that poses major challenges [1]
It is worth mentioning that, unsurprisingly, the area under the receiver operating characteristic (ROC) curve showed a much more cluttered picture, which we omit in the interest of space
(2) PTMA, Roughly balancing (RB)- and exactly balancing (EB)-bagging perform better on macro-accuracy while PTF1, SMOTE- and Random Balance (RNB)-bagging perform better on the macro F1-score. This shows that different resampling mechanisms are suitable for different performance measures; (3) PT-bagging – with appropriate thresholds – performed well in each of the evaluated measures, while the rest of methods performed poorly in at least one of them, e.g., RB-bagging performed poorly in macro F1-score and AUCPR, while SMOTE- and RNB-bagging performed poorly in macro-accuracy

Summary

Introduction

Dealing with a class imbalance in classification is an important problem that poses major challenges [1]. Standard learning algorithms are often guided by global error rates and may ignore instances of the minority class, leading to models biased towards predicting the majority class. A first choice consists of preprocessing the data by resampling to balance the class distribution [8,9]. This is often achieved by either randomly oversampling (ROS) the minority class [9] or randomly undersampling (RUS) the majority class [10].

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neurocomputing	Publication Date: Sep 9, 2017
Citations: 120	License type: cc-by

R Discovery Prime

R Discovery Prime

A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
Jiakun Zhao ... Si Chen
Intelligent Data Analysis | VOL. 26
Jiakun Zhao, et. al.Jiakun Zhao ... Si Chen
18 Apr 2022
Intelligent Data Analysis | VOL. 26

A survey of multi-class imbalanced data classification methods
Meng Han ... Shujuan Liu
Journal of Intelligent & Fuzzy Systems | VOL. 44
Meng Han, et. al.Meng Han ... Shujuan Liu
30 Jan 2023
Journal of Intelligent & Fuzzy Systems | VOL. 44

Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm
Piyasak Jeatrakul ... Kok Wai Wong
-
Piyasak Jeatrakul, et. al.Piyasak Jeatrakul ... Kok Wai Wong
01 Jun 2012
01 Jun 2012

Multi-Class Imbalanced Data Handling with Concept Drift in Fog Computing: A Taxonomy, Review, and Future Directions
Farhana Sharief ... Humaira Ijaz
ACM Computing Surveys | VOL. 57
Farhana Sharief, et. al.Farhana Sharief ... Humaira Ijaz
07 Oct 2024
ACM Computing Surveys | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing