Abstract

With the current increasing volume and dimensionality of data, traditional data classification algorithms are unable to satisfy the demands of practical classification applications of data streams. To deal with noise and concept drift in data streams, we propose an ensemble classification algorithm based on attribute reduction and a sliding window in this paper. Using mutual information, an approximate attribute reduction algorithm based on rough sets is used to reduce data dimensionality and increase the diversity of reduced results in the algorithm. A double-threshold concept drift detection method and a three-stage sliding window control strategy are introduced to improve the performance of the algorithm when dealing with both noise and concept drift. The classification precision is further improved by updating the base classifiers and their nonlinear weights. Experiments on synthetic datasets and actual datasets demonstrate the performance of the algorithm in terms of classification precision, memory use, and time efficiency.

Highlights

  • With the emerging era of Big Data, data stream classification technology has become a main topic in data mining research

  • We propose an ensemble classification algorithm of data streams based on attribute reduction and a sliding window

  • We propose a rough set-based approximate attribute reduction algorithm to decrease the computation complexity of the ensemble classification algorithm and increase the difference between base classifiers

Read more

Summary

Introduction

With the emerging era of Big Data, data stream classification technology has become a main topic in data mining research. Recent studies mainly employ two types of methods to realize the classification of data streams, which are single classifier learning and ensemble classifier learning [1,2]. The streaming ensemble algorithm was first introduced in Reference [4] to deal with concept drift in data stream classification by updating the base classifiers. The accuracy weighted ensemble (AWE) algorithm was further proposed [3] to weight base classifiers according to the classification error rate. The accuracy update ensemble algorithm [5] is based on nonlinear weights of the base classifiers and achieves higher precision and lower memory use than AWE. In Reference [2], one linear function and one nonlinear function are used to weight base classifiers when input data are added under different conditions

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call