Mining non-stationary stream is a challenging task due to its unique property of infinite length and dynamic characteristics let alone the issues of concept drift, concept evolution and limited labeled data. Although more attention has been attracted on the issues of concept drift and evolution in data streams, however, most of existing methods are supervised in nature, which probably result in a worse classification performance and lower efficiency in the case with scarcity of labeled data. Thus, in this paper, we proposed a semi-supervised framework with recurring concept drift and novel class detection called ESCR, which aims to detect recurring concept drift and concept evolution in data streams with partially labeled data. It is firstly built on an ensemble model consisted of several clustering-based classifiers. In terms of this framework, we adopt Jensen–Shannon divergence based change detection technique on classifier confidence score instead of classification error rate to detect recurring concept drifts. Meanwhile, we take concept evolution into consideration by monitoring the outliers with strong cohesion. Moreover, we further improve the execution efficiency of our framework by exploiting the recursive function and dynamic programming. Finally, extensive experiments conducted on both benchmark and synthetic data sets demonstrate the effectiveness and efficiency of our proposed semi-supervised framework in the handling of data streams with recurring concept drifts and concept evolution, as compared to several well-known semi-supervised data stream classification methods.
Read full abstract