Abstract

Mining non-stationary stream is a challenging task due to its unique property of infinite length and dynamic characteristics let alone the issues of concept drift, concept evolution and limited labeled data. Although more attention has been attracted on the issues of concept drift and evolution in data streams, however, most of existing methods are supervised in nature, which probably result in a worse classification performance and lower efficiency in the case with scarcity of labeled data. Thus, in this paper, we proposed a semi-supervised framework with recurring concept drift and novel class detection called ESCR, which aims to detect recurring concept drift and concept evolution in data streams with partially labeled data. It is firstly built on an ensemble model consisted of several clustering-based classifiers. In terms of this framework, we adopt Jensen–Shannon divergence based change detection technique on classifier confidence score instead of classification error rate to detect recurring concept drifts. Meanwhile, we take concept evolution into consideration by monitoring the outliers with strong cohesion. Moreover, we further improve the execution efficiency of our framework by exploiting the recursive function and dynamic programming. Finally, extensive experiments conducted on both benchmark and synthetic data sets demonstrate the effectiveness and efficiency of our proposed semi-supervised framework in the handling of data streams with recurring concept drifts and concept evolution, as compared to several well-known semi-supervised data stream classification methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.