An Ensemble-Based Semi-Supervised Learning Approach for Non-Stationary Imbalanced Data Streams with Label Scarcity

Yousef Abdi,Mohammad Asadpour,Mohammad-Reza Feizi-Derakhshi

doi:10.1016/j.asoc.2024.112353

Abstract

Addressing the challenges of learning from multi-class imbalanced data streams, particularly in scenarios with scarce labeled data and concept drift, remains an open problem in the field of data stream mining. Despite the prevalence of such data scenarios in real-world applications, existing approaches have yet to provide effective solutions. In this paper, we propose a novel chunk-based semi-supervised framework, GMCSSEL, which leverages an ensemble of base classifiers trained on micro-cluster centers with labels inferred through a graph-fusion and label propagation process. Our approach incorporates chunk-based incremental label propagation by integrating both current and previously inferred label information into the propagation equation, with regularization parameters applied to control their influence. Furthermore, our method includes a novel concept drift detection mechanism specifically designed for imbalanced data with label scarcity. The imbalanced data problem is addressed through a combination of graph fusion, label matrix normalization, and SMOTE techniques. Experimental results on synthetic data streams with varying class ratios and concept drifts, as well as real multi-class streams, demonstrate the superior classification performance of our approach compared to the IOE semi-supervised algorithm. Our method achieves an average increase in evaluation metrics of 12.5% across multiple data streams, with improvements ranging from 8% to 18% in G-mean, AUC, Kappa, and F1-score metrics. Statistical analysis confirms that these improvements are significant, highlighting the robustness of our approach in handling non-stationary imbalanced data streams with label scarcity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Ensemble-Based Semi-Supervised Learning Approach for Non-Stationary Imbalanced Data Streams with Label Scarcity

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Similar Papers

Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm
Zeng Li ... Tuanfei Zhu
Knowledge-Based Systems | VOL. 195
Zeng Li, et. al.Zeng Li ... Tuanfei Zhu
27 Feb 2020
Knowledge-Based Systems | VOL. 195

Online Oversampling for Sparsely Labeled Imbalanced and Non-Stationary Data Streams
Lukasz Korycki ... Bartosz Krawczyk
-
Lukasz Korycki, et. al.Lukasz Korycki ... Bartosz Krawczyk
01 Jul 2020
01 Jul 2020

Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
Paweł Zyblewski ... Michał Woźniak
-
Paweł Zyblewski, et. al.Paweł Zyblewski ... Michał Woźniak
01 Jan 2020
01 Jan 2020

RETRACTED ARTICLE: Comprehensive analysis for class imbalance data with concept drift using ensemble based classification
S Priya ... R Annie Uthra
Journal of Ambient Intelligence and Humanized Computing | VOL. 12
S Priya, et. al.S Priya ... R Annie Uthra
11 Apr 2020
Journal of Ambient Intelligence and Humanized Computing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Ensemble-Based Semi-Supervised Learning Approach for Non-Stationary Imbalanced Data Streams with Label Scarcity

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing