Abstract

Most of data stream learning methods assume that a true class of an incoming instance is available right after it has been processed. However, assumption that we have an unlimited access to class labels is unrealistic and is directly connected with a very high labeling cost. This is a driving force behind growing development of methods that require reduced or no access to class labels. Among several potential directions active learning emerges as a promising solution, by allowing for a selection of most valuable instances from the stream and using as few label queries. Despite numerous proposals of active learning methods for static data, this domain is still developing for data streams. Here, non-stationary nature of data must be taken into consideration and proposed algorithms must accommodate potential occurrences of concept drift. In this paper we propose a Query by Committee active learning strategy that is adapted to online learning from drifting data streams. A decision regarding label query is made by an ensemble of classifiers instead of a single learner, leading to an improved instance selection. We present four different approaches for online Query by Committee and evaluate their usefulness on the basis of obtained accuracy with limited budgets and ability to handle concept drift. We introduce Budget Loss of Accuracy, a novel measure for evaluating active learning algorithms. Finally, we investigate the relationships between the efficacy of Query by Committee models and diversity of underlying ensembles. Based on thorough experimental investigation we are able to show the usefulness of proposed algorithms for reducing labeling effort in learning from drifting data streams.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call