Comparative Evaluation of Imbalanced Data Management Techniques for Solving Classification Problems on Imbalanced Datasets

Tanawan Watthaisong,Nipotepat Muangkote,Khamron Sunat

doi:10.19139/soic-2310-5070-1890

Abstract

Dealing with imbalanced data is crucial and challenging when developing effective machine-learning models for data classification purposes. It significantly impacts the classification model's performance without proper data management, leading to suboptimal results. Many methods for managing imbalanced data have been studied and developed to improve data balance. In this paper, we conduct a comparative study to assess the influence of a ranking technique on the evaluation of the effectiveness of 66 traditional methods for addressing imbalanced data. The three classification models, i.e., Decision Tree, Random Forest, and XGBoost, act as classification models. The experimental settings have been divided into two segments. The first part evaluates the performance of various imbalanced dataset handling methods, while the second part compares the performance of the top 4 oversampling methods. The study encompasses 50 separate datasets: 20 retrieved from the UCI repository and 30 sourced from the OpenML repository. The evaluation is based on F-Measure and statistical methods, including the Kruskal-Wallis test and Borda Count, to rank the data imbalance handling capabilities of the 66 methods. The SMOTE technique is the benchmark for comparison due to its popularity in handling imbalanced data. Based on the experimental results, the MCT, Polynom-fit-SMOTE, and CBSO methods were identified as the top three performers, demonstrating superior effectiveness in managing imbalanced datasets. This research could be beneficial and serve as a practical guide for practitioners to apply suitable techniques for data management.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative Evaluation of Imbalanced Data Management Techniques for Solving Classification Problems on Imbalanced Datasets

Abstract

Talk to us

Similar Papers

More From: Statistics, Optimization & Information Computing

Lead the way for us

Similar Papers

A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
Ming Zheng ... Yuhao Miao
Axioms | VOL. 11
Ming Zheng, et. al.Ming Zheng ... Yuhao Miao
01 Nov 2022
Axioms | VOL. 11

Feature selection for classification using WGCNA and Spread Sub-Sample for an imbalanced rheumatoid arthritis RNASEQ data
Consolata Gakii ... Boaz Too
Informatics in Medicine Unlocked | VOL. 43
Consolata Gakii, et. al.Consolata Gakii ... Boaz Too
01 Jan 2023
Informatics in Medicine Unlocked | VOL. 43

ForesTexter: An efficient random forest algorithm for imbalanced text categorization
Qingyao Wu ... Shen-Shyang Ho
Knowledge-Based Systems | VOL. 67
Qingyao Wu, et. al.Qingyao Wu ... Shen-Shyang Ho
19 Jun 2014
Knowledge-Based Systems | VOL. 67

Comparing the classification performances of supervised classifiers with balanced and imbalanced SAR data sets
Mustafa Üstüner ... Füsun Balık Şanlı
-
Mustafa Üstüner, et. al.Mustafa Üstüner ... Füsun Balık Şanlı
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Evaluation of Imbalanced Data Management Techniques for Solving Classification Problems on Imbalanced Datasets

Abstract

Talk to us

Similar Papers

More From: Statistics, Optimization &amp; Information Computing

More From: Statistics, Optimization & Information Computing