Abstract

Predicting online on data streams, with data flowing continuously, quickly, and in large quantities, is becoming increasingly more important in tackling real-world problems. In such scenarios, data distribution usually evolves over time, a situation known as concept drift. This article presents an empirical method, based on a differential evolution, aimed at guiding users on how to tune concept drift detectors to improve their accuracies in data streams. It also suggests the best detectors, strategies and/or parameterization to use in different data stream scenarios. The most time-consuming part of the method was preprocessed and is based on experiments using eight artificial dataset generators, each of them with five abrupt, fast gradual, and slow gradual concept drift versions (120 dataset versions in all), as well as six real-world datasets, for 11 different drift detectors and two different base learners. The use of the proposed method is illustrated mostly with Drift Detection Method (DDM) together with Naive Bayes, but we compared the performances of all 11 detectors using their default parameter values and the several parameter sets prescribed by the method using both base classifiers—Naive Bayes and Hoeffding Trees. Results indicate that the predictive accuracies of the detectors tuned using the method increased considerably in many situations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call