Abstract

Combining several classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. This paper introduces human recalling and forgetting mechanisms into a data stream mining system and proposes a Memorizing Based Data Stream Mining (MDSM) model. In this model, each component classifier is regarded as a piece of knowledge that a human obtains through learning some materials and has a memory retention value reflecting its usefulness in the history. The classifiers with high memory retention values are reserved in a “knowledge repository.” When a new data chunk comes, most useful classifiers will be selected (recalled) from the repository and compose the current target ensemble. Based on MDSM, we put forward a new algorithm, MAE (Memorizing Based Adaptive Ensemble), which uses Ebbinghaus forgetting curve as the forgetting mechanism and adopts ensemble pruning as the recalling mechanism. Compared with four popular data stream mining approaches on the datasets with different concept drifts, the experimental results show that MAE achieves high and stable predicting accuracy, especially for the applications with recurring or complex concept drifts. The results also prove the effectiveness of MDSM model.

Highlights

  • Classification is one of the main applications of machine learning

  • Inspired by human recalling and forgetting mechanism, we propose a new model, Memorizing Based Data Stream Mining (MDSM) (Memorizing based Data Stream Mining), which introduces human memorizing characteristics into data steam mining

  • Inspired by the characteristics of human recalling and forgetting, we proposed a new model, MDSM (Memorizing based Data Stream Mining), for data stream mining

Read more

Summary

Introduction

Classification is one of the main applications of machine learning. Traditional classification methods are devoted to static environment where the whole training data is available to a learning system. Many approaches have been proposed to handle data streams with concept drifting, which include sliding window approaches [4, 5], drift detecting techniques [6,7,8], and adaptive ensembles [2, 9,10,11,12,13,14]. Learn++.NSE [11] and Bagging++ [12] are the other kind of chunk-based ensemble algorithms in which no pruning is used to limit the number of component classifiers This makes them requiring much memory and testing time. In MDSM model, a component classifier which has low accuracy for current data trunk can still be reserved in knowledge repository if its memory retention is high enough This prevents useful classifiers from being discarded when sudden concept drifts occur and improves the stability of data stream mining.

Related Work
MDSM: A New Data Stream Mining Model
MAE Algorithm
Experimental Setup
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call