Ensemble Classification of Data Streams Based on Attribute Reduction and a Sliding Window

Yingchun Chen,Yu Sun,Fei Li,Ou Li

doi:10.3390/app8040620

Abstract

With the current increasing volume and dimensionality of data, traditional data classification algorithms are unable to satisfy the demands of practical classification applications of data streams. To deal with noise and concept drift in data streams, we propose an ensemble classification algorithm based on attribute reduction and a sliding window in this paper. Using mutual information, an approximate attribute reduction algorithm based on rough sets is used to reduce data dimensionality and increase the diversity of reduced results in the algorithm. A double-threshold concept drift detection method and a three-stage sliding window control strategy are introduced to improve the performance of the algorithm when dealing with both noise and concept drift. The classification precision is further improved by updating the base classifiers and their nonlinear weights. Experiments on synthetic datasets and actual datasets demonstrate the performance of the algorithm in terms of classification precision, memory use, and time efficiency.

Highlights

With the emerging era of Big Data, data stream classification technology has become a main topic in data mining research
We propose an ensemble classification algorithm of data streams based on attribute reduction and a sliding window
We propose a rough set-based approximate attribute reduction algorithm to decrease the computation complexity of the ensemble classification algorithm and increase the difference between base classifiers

Summary

Introduction

With the emerging era of Big Data, data stream classification technology has become a main topic in data mining research. Recent studies mainly employ two types of methods to realize the classification of data streams, which are single classifier learning and ensemble classifier learning [1,2]. The streaming ensemble algorithm was first introduced in Reference [4] to deal with concept drift in data stream classification by updating the base classifiers. The accuracy weighted ensemble (AWE) algorithm was further proposed [3] to weight base classifiers according to the classification error rate. The accuracy update ensemble algorithm [5] is based on nonlinear weights of the base classifiers and achieves higher precision and lower memory use than AWE. In Reference [2], one linear function and one nonlinear function are used to weight base classifiers when input data are added under different conditions

Objectives

Methods

Findings

Conclusion