A review on real time data stream classification and adapting to various concept drift scenarios

Priyanka B. Dongre,Latesh G. Malik

doi:10.1109/iadcc.2014.6779381

Abstract

Data streams are viewed as a sequence of relational tuples (e.g., sensor readings,call records, web page visits) that continuously arrive at time-varying and possibly unbound streams. These data streams are potentially huge in size and thus it is impossible to process many data mining techniques and approaches. Classification techniques fail to successfully process data streams because of two factors: their overwhelming volume and their distinctive feature known as concept drift. Concept drift is a term used to describe changes in the learned structure that occur over time. The occurance of concept drift leads to a drastic drop in classification accuracy. The recognition of concept drift in data streams has led to sliding-window approaches also different approaches to mining data streams with concept drift include instance selection methods, drift detection, ensemble classifiers, option trees and using Hoeffding boundaries to estimate classifier performance. This paper describes the various types of concept drifts that affect the data examples and discusses various approaches in order to handle concept drift scenarios. The aim of this paper is to review and compare single classifier and ensemble approaches to data stream mining respectively.

Full Text