Abstract

Concept drift is a core challenge in classification tasks of data streams. Although many drift adaptation methods have been presented, most of them assume that labels of all data are available, which is impractical in many real-world applications. Additionally, the absence of label makes the imbalance ratio of an imbalanced data stream difficultly being obtained in time, providing the inaccurate guidance for resampling and causing poor generalization. To tackle the joint challenges, an online semi-supervised active learning method is proposed to classifier imbalanced data streams with concept drift. A newly-arrived data is first added to the sliding window, and then assigned a pseudo label in terms of its nearest cluster. Meanwhile, semi-supervised clustering algorithm offers its predicted label. Based on the above two predictive labels, cluster-based query strategy provides the criteria for the evaluation and selection of representative instances. More especially, the uncertainty and importance of instances are defined to synthetically evaluate its representativeness. After obtaining true labels of typical ones, ensemble classifier is updated by all instances in current sliding window. Experimental results on 13 synthetic and real data streams indicate that the proposed method outperforms six comparative methods on both G-mean and Recall under various labeling budgets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call