Abstract

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Highlights

  • Data stream classification has attracted much attention in the scenario of big data mining due to its presence in many real-world fields, such as social network analysis, weather prediction, online medical diagnosis, and weblog mining [1,2,3,4,5]

  • Concept drift is a common feature of data streams [6,7,8,9], which refers to the phenomenon of target concepts of streams changing over time

  • We verified the effectiveness of Cost-Sensitive based Data Stream (CSDS) using cost-sensitive strategies in evolving data stream scenarios involving different types of drifts and class imbalance

Read more

Summary

Introduction

Data stream classification has attracted much attention in the scenario of big data mining due to its presence in many real-world fields, such as social network analysis, weather prediction, online medical diagnosis, and weblog mining [1,2,3,4,5]. A growing number of methodologies have been proposed for dealing with concept drift [9] Among these techniques, the window-based method adopts a natural way of forgetting mechanism to add new instances and eliminate outdated instances. Several popular methods for dealing with the class imbalance issue [13,14,15,16,17,18] can be broken down into main groups: data-level techniques, cost-sensitive learning, and ensemble methods. Constructing classifiers under evolving data streams existing class imbalance is not a trivial task It should address the following subproblems: (1) How can concept drift be handled? (2) A dynamic cost-sensitive weighting mechanism is developed in the classification stage, incorporating cost value into the learning to alleviate the class imbalance at the algorithm level. (3) e performance of our algorithm was implemented on different kinds of class imbalance data stream benchmarks. e results demonstrated that CSDS achieves the best overall performance in G-mean, running time, and concept drifts adaption

Related Work
Our Method
Wk Classifier
HyperPlane SEA LED Rotating spiral Spam Sensor Electricity Airlines Average rank
KUE CSDS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call