Abstract

Data streams are sequences of fast-growing and high-speed data points that typically suffer from the infinite length, large volume, and specifically unstable data distribution. Ensemble learning as a prevalent classification approach is widely used in data stream mining studies. Besides the impressive performance of ensemble learning algorithms in providing a collection of diverse and accurate classifiers, they are specifically efficient in handling non-stationary data streams. Due to the component-based nature and the chance of dynamic updates for the components of the ensemble, this category is appropriate for dynamically learning the changing concepts of the data. Several review research have been conducted on the challenges of non-stationary data streams so far. None of the available surveys are dedicated to examine the effect of ensemble learning models on concept drift handling. This paper aims to provide a thorough theoretical and experimental review of the most significant ensemble-based data stream classification approaches in confronting with various types of concept drifts. In this comprehensive experimental analysis, 21 state-of-the-art ensemble algorithms are tested on 30 synthetic datasets under various types and volumes of concept drifts. This experimental analysis involves analyzing the classification accuracy, kappa score and running time of the algorithms, along with performing various statistical tests. In addition to conducting experimental analysis, a comprehensive investigation is carried out on the most significant challenges posed by non-stationary data streams, which provides the reader with valuable insights.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call