Abstract

Data streams, which can be considered as one of the primary sources of what is called big data, arrive continuously with high speed. The biggest challenge in data streams mining is to deal with concept drifts, during which ensemble methods are widely employed. The ensembles for handling concept drift can be categorized into two different approaches: online and block-based approaches. The primary disadvantage of the block-based ensembles lies in the difficulty of tuning the block size to provide a tradeoff between fast reactions to drifts. Motivated by this challenge, we put forward an online ensemble paradigm, which aims to combine the best elements of block-based weighting and online processing. The algorithm uses the adaptive windowing as a change detector. Once a change is detected, a new classifier is built replacing the worst one in the ensemble. By experimental evaluations on both synthetic and real-world datasets, our method performs significantly better than other ensemble approaches.

Highlights

  • In recent years, some promising computing paradigms have emerged to meet the needs of big data

  • There are a lot of applications in practice, such as sensor networks [1], spam filtering [2], intrusion detection [3], and credit card fraud detection [4], which generate continuously arriving data, known as data streams [5]

  • In the real-world, the generation of data streams is usually in the nonstationary environment, which means that the underlying distribution of the data can change arbitrarily over time

Read more

Summary

Introduction

Some promising computing paradigms have emerged to meet the needs of big data. In the real-world, the generation of data streams is usually in the nonstationary environment, which means that the underlying distribution of the data can change arbitrarily over time. This phenomenon is known as concept drift [7, 8], which exists commonly in the scenarios of big data mining. Weather prediction models change according to the seasons, and in recommend systems, user consumption patterns may change over time due to fashion, economy, and so forth. The occurrence of such change leads to a drastic drop in classification accuracy. The learning models should be able to adapt to the changes quickly and

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call