High-Dimensional Multi-Label Data Stream Classification With Concept Drifting Detection

Peipei Li,Xindong Wu,Haixiang Zhang,Xuegang Hu

doi:10.1109/tkde.2022.3200068

Abstract

Multi-label data streams such as Web texts and images have been popular on the Web. These data present the characteristics of multiple label, high dimensionality, high volume, high velocity and especial concept drift etc. Thus, multi-label data stream classification is a very challenging and significant task especially in the handling of high-dimensional data with concept drifts. However, this challenge has received little attention from the research community. Therefore, we propose the max-relevance and min-redundancy based algorithm adaptation approach for the efficient and effective classification on multi-label data streams with high-dimensional attributes and concept drifts .<xref ref-type="fn" rid="fn1"><sup>1</sup></xref><fn id="fn1"><label>1.</label> Source codes and data sets are available at below. https://github.com/peipeilihfut/MLStreamClassification </fn> In order to reduce the impact from the high-dimensional data with noisy attributes, we first refine the minimal-redundancy-maximal-relevance criterion based on mutual information to select qualified features in multi-label data streams. Secondly, we propose the data distribution based concept drifting detection approach to distinguish concept drifts hidden in data streams. Finally, we build an incremental ensemble classification model for efficiently classifying multi-label data streams. Extensive studies show that our approach can get optimal subsets of features while maintaining a good performance in the multi-label classification, as compared to several state-of-the-art multi-label feature selection algorithms using two efficient multi-label classification methods as base classifiers. Meanwhile, our approach is superior to three well-known multi-label data stream classification approaches in the effectiveness and efficiency.

Full Text