Online semi-supervised active learning ensemble classification for evolving imbalanced data streams

Yinan Guo,Jiayang Pu,Botao Jiao,Yanyan Peng,Dini Wang,Shengxiang Yang

doi:10.1016/j.asoc.2024.111452

Abstract

Concept drift is a core challenge in classification tasks of data streams. Although many drift adaptation methods have been presented, most of them assume that labels of all data are available, which is impractical in many real-world applications. Additionally, the absence of label makes the imbalance ratio of an imbalanced data stream difficultly being obtained in time, providing the inaccurate guidance for resampling and causing poor generalization. To tackle the joint challenges, an online semi-supervised active learning method is proposed to classifier imbalanced data streams with concept drift. A newly-arrived data is first added to the sliding window, and then assigned a pseudo label in terms of its nearest cluster. Meanwhile, semi-supervised clustering algorithm offers its predicted label. Based on the above two predictive labels, cluster-based query strategy provides the criteria for the evaluation and selection of representative instances. More especially, the uncertainty and importance of instances are defined to synthetically evaluate its representativeness. After obtaining true labels of typical ones, ensemble classifier is updated by all instances in current sliding window. Experimental results on 13 synthetic and real data streams indicate that the proposed method outperforms six comparative methods on both G-mean and Recall under various labeling budgets.

Full Text