Bagging Using Instance-Level Difficulty for Multi-Class Imbalanced Big Data Classification on Spark

William C Sleeman Iv,Bartosz Krawczyk

doi:10.1109/bigdata47090.2019.9006058

Abstract

Most machine learning methods work under the assumption that classes have a roughly balanced number of instances. However, in many real-life problems we may have some types of instances appearing predominantly more frequently than the others which causes a bias towards the majority class during classifier training. This becomes even more challenging when dealing with multiple classes, where relationships between them are not easily defined. Learning from multi-class imbalanced data has not been widely considered in the context of big data mining, despite the fact that this is a learning difficulty frequently appearing in this domain. In this paper, we address this challenge by proposing a comprehensive ensemble-based framework. We propose to analyze each class to extract instance-level characteristics describing their difficulty levels. We embed this information into the existing UnderBagging framework. Our ensemble samples instances with probabilities proportional to their difficulty levels. This allows us to focus the learning process on the most difficult instances, better capturing the properties of multi-class imbalanced problems. We implemented our framework on Apache Spark to allow for high-performance computing over big data sets. This experimental study shows that taking into account the instance-level difficulty leads to training of significantly more accurate ensembles.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bagging Using Instance-Level Difficulty for Multi-Class Imbalanced Big Data Classification on Spark

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multi-class imbalanced big data classification on Spark
William C Sleeman Iv ... Bartosz Krawczyk
Knowledge-Based Systems | VOL. 212
William C Sleeman Iv, et. al.William C Sleeman Iv ... Bartosz Krawczyk
07 Nov 2020
Knowledge-Based Systems | VOL. 212

A Comprehensive Analysis on Multi-class Imbalanced Big Data Classification
R Madhura Prabha ... S Sasikala
-
R Madhura Prabha, et. al.R Madhura Prabha ... S Sasikala
01 Jan 2021
01 Jan 2021

A survey of multi-class imbalanced data classification methods
Meng Han ... Shujuan Liu
Journal of Intelligent & Fuzzy Systems | VOL. 44
Meng Han, et. al.Meng Han ... Shujuan Liu
30 Jan 2023
Journal of Intelligent & Fuzzy Systems | VOL. 44

An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme
Jingjun Bi ... Chongsheng Zhang
Knowledge-Based Systems | VOL. 158
Jingjun Bi, et. al.Jingjun Bi ... Chongsheng Zhang
04 Jun 2018
Knowledge-Based Systems | VOL. 158

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bagging Using Instance-Level Difficulty for Multi-Class Imbalanced Big Data Classification on Spark

Abstract

Talk to us

Similar Papers