Abstract

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

Highlights

  • In nonstationary streaming data environment, these investigations solved some of the problems, including concept drift, the curse of dimensionality, and imbalanced learning

  • An incremental learning strategy combined with an ensemble learning and a smoothing operator does the work of adapting the model to concept drift, distinguishing noise, and maintaining stability

  • According to the above definition, we explore a mapping function f: X ⟶ y with high accuracy which stands for classification model that can output the incoming instance X’s class label

Read more

Summary

Related Work

An excellent data stream classification approach has the ability to learn incrementally and adapt to concept drift as well [18]. Based on the above analysis, our solution involves three strategies to deal with the research problem as illustrated in Figure 2: incremental learning to track concept drift; ensemble learning to enhance the model’s stability; and microclustering method to distinguish drift from noise and make the final label predictions. Is scenario suggests that incoming instance is derived from the different joint probability distribution In summary of the above phases and scenarios in data stream classification model, the algorithm of microcluster-based incremental ensemble classification named as MCBIE is expressed in Algorithm 1. The previous instances are not reserved over time and the statistical information of microcluster is recorded, such as SS, LS, and Ci d, which can save the storage memory by this way

Experiments
Experiment 1
Experiment 2
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call