Abstract

Ensemble techniques are a powerful method for recognising and reacting to changes in non-stationary data. However, most researches into dynamic classification with ensembles assume that the true class label of each incoming point is available or easily obtained. This is unrealistic in most practical applications, especially in high-velocity streams where manually labeling each point is prohibitively expensive. To address this challenge, this paper proposes an algorithm, named Clustering and One-Class Classification Ensemble Learning (COCEL), which incorporates a stream clustering algorithm and an ensemble of one-class classifiers with active learning, for classification in dynamic data streams. The method exploits the intuitive relationship between clusters and one-class classifiers to cope with a small training set (or no training set) and improve with experience, self-modifying its internal state to cope with changes in the data stream. The proposed method is evaluated on synthetic data streams exhibiting concept evolution and concept drift and a collection of high-velocity real data streams where manually labeling each incoming point is infeasible or expensive and labor intensive. Finally, a comparative evaluation with peer stream classification ensembles shows that COCEL can achieve superior or comparative accuracy while typically requiring less than 0.01% of the stream labels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call