A Survey Study on Proposed Solutions for Imbalanced Big Data

Shaymaa Ahmed Razoqi,Ghayda A.A Al-Talib

doi:10.24996/ijs.2024.65.3.37

Abstract

Learning from imbalanced data has been a focus of studies for more than two decades of continuous development. Training data is considered imbalanced when the size of the positive (minority) class is neglected because of the large size of the negative (majority) class, in addition to the problem of deviating distributions of binary tasks. The appearance of big data brings new problems and challenges to the imbalance problem. Big Data announces the challenges with 5V: volume, velocity, veracity, value, and variety. This study relied on dividing the solution to the problem of data imbalance into three levels: data level, algorithm level, and hybrid approaches. First, the standard solutions for this problem that were proposed were mentioned, and in addition, the most important metrics adopted for measuring the classification efficiency of imbalanced data were identified. In this survey study, 27 studies were reviewed during the period 2015–2022, distributed according to the levels of treatment of the imbalance problem. They also reviewed the performance metrics that were used in these studies and the sources of the datasets to which these solutions were applied. The study makes it easier for researchers and scholars to see the solutions to addressing the problem of data imbalance and the hybrid approaches recently used for that, and to take advantage of them in improving the classification process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Iraqi Journal of Science	Publication Date: Mar 29, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

A Survey Study on Proposed Solutions for Imbalanced Big Data

Abstract

Talk to us

Similar Papers

More From: Iraqi Journal of Science

Lead the way for us

Similar Papers

Ensemble Learning Based on Active Example Selection for Solving Imbalanced Data Problem in Biomedical Data
Min Su Lee ... Sangyoon Oh
-
Min Su Lee, et. al.Min Su Lee ... Sangyoon Oh
01 Nov 2009
01 Nov 2009

CDBH: A clustering and density-based hybrid approach for imbalanced data classification
Behzad Mirzaei ... Hossein Nezamabadi-Pour
Expert Systems with Applications | VOL. 164
Behzad Mirzaei, et. al.Behzad Mirzaei ... Hossein Nezamabadi-Pour
28 Sep 2020
Expert Systems with Applications | VOL. 164

SMOTE: POTENSI DAN KEKURANGANNYA PADA SURVEI
Eka N Kencana ... Ni Putu Yulika Trisna Wijayanti
E-Jurnal Matematika | VOL. 10
Eka N Kencana, et. al.Eka N Kencana ... Ni Putu Yulika Trisna Wijayanti
30 Nov 2021
E-Jurnal Matematika | VOL. 10

Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis.
Koichi Fujiwara ... Mai Kamaguchi
Frontiers in Public Health | VOL. 8
Koichi Fujiwara, et. al.Koichi Fujiwara ... Mai Kamaguchi
19 May 2020
Frontiers in Public Health | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Survey Study on Proposed Solutions for Imbalanced Big Data

Abstract

Talk to us

Similar Papers

More From: Iraqi Journal of Science