A Semi-supervised Ensemble Approach for Mining Data Streams

Jing Liu,Guo-Sheng Xu,Xin-Xin Niu,Da Xiao,Li-Ze Gu

doi:10.4304/jcp.8.11.2873-2879

Abstract

There are many challenges in mining data streams, such as infinite length, evolving nature and lack of labeled instances. Accordingly, a semi-supervised ensemble approach for mining data streams is presented in this paper. Data streams are divided into data chunks to deal with the infinite length. An ensemble classification model E is trained with existing labeled data chunks and decision boundary is constructed using E for detecting novel classes. New labeled data chunks are used to update E while unlabeled ones are used to construct unsupervised models. Classes are predicted by a semi-supervised model Ex which is consist of E and unsupervised models in a maximization consensus manner, so better performance can be achieved by using the constraints from unsupervised models with limited labeled instances. Experiments with different datasets demonstrate that our method outperforms conventional methods in mining data streams.

Full Text