Experimental evaluation of ensemble classifiers for imbalance in Big Data

Mario Juez-Gil,Álvar Arnaiz-González,Juan J Rodríguez,César García-Osorio

doi:10.1016/j.asoc.2021.107447

Abstract

Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tendency of classifiers that show bias in favor of the majority class and that ignore the minority one. To date, although the number of imbalanced classification methods have increased, they continue to focus on normal-sized datasets and not on the new reality of Big Data. In this paper, in-depth experimentation with ensemble classifiers is conducted in the context of imbalanced Big Data classification, using two popular ensemble families (Bagging and Boosting) and different resampling methods. All the experimentation was launched in Spark clusters, comparing ensemble performance and execution times with statistical test results, including the newest ones based on the Bayesian approach. One very interesting conclusion from the study was that simpler methods applied to unbalanced datasets in the context of Big Data provided better results than complex methods. The additional complexity of some of the sophisticated methods, which appear necessary to process and to reduce imbalance in normal-sized datasets were not effective for imbalanced Big Data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Soft Computing	Publication Date: May 7, 2021
Citations: 18	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Experimental evaluation of ensemble classifiers for imbalance in Big Data

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Similar Papers

Output Thresholding for Ensemble Learners and Imbalanced Big Data
Justin M Johnson ... Taghi M Khoshgoftaar
-
Justin M Johnson, et. al.Justin M Johnson ... Taghi M Khoshgoftaar
01 Nov 2021
01 Nov 2021

Imbalanced big data classification based on virtual reality in cloud computing
Wen-Da Xie ... Xiaochun Cheng
Multimedia Tools and Applications | VOL. 79
Wen-Da Xie, et. al.Wen-Da Xie ... Xiaochun Cheng
20 Feb 2019
Multimedia Tools and Applications | VOL. 79

Multiclass Imbalanced Big Data Classification Utilizing Spark Cluster
Tinku Singh ... Manish Kumar
-
Tinku Singh, et. al.Tinku Singh ... Manish Kumar
06 Jul 2021
06 Jul 2021

Human Activity Recognition in Imbalanced Big Data Using Fuzzy Rule-Based Classification System
Khyati Ahlawat ... Amit Prakash Singh
-
Khyati Ahlawat, et. al.Khyati Ahlawat ... Amit Prakash Singh
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experimental evaluation of ensemble classifiers for imbalance in Big Data

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing