Abstract

In this digital era we are surrounded by social media applications and the hardware devices (such as sensorsetc) which are pouring data at an astonishing rate. This incoming data from heterogeneous sources is referred as data stream. Analysing data in motion (data streams) has become new challenge in order to meet the demands of real time analytics. Conventional mining techniques are proving inefficient since the behaviour of data itself has changed. Other challenges associated with data streams include resources constraints like memory and running time along with single scan of the data. Due to the time variant nature of data streams, applying any mining algorithm such as classification, clustering, indexing in a single scan of data is a tedious task. This paper focuses on concept drift problem in classification of streaming data. During classification a change in the concept or distribution of dataset over the time is termed as concept drift. The performance of a model/classifier degrades due to concept drift even in stationary data; dealing with this problem hence become more challenging in data streams. This paper presents categorization of existing streaming data classification algorithms along with their ability to solve concept drift problem. It also presents comparison of various tools available for simulating such problems along with their limitations. The paper also lists the various datasets and performance metrics that have been used in literature for performance analysis. Thus, this paper may serve as a complete roadmap for the researchers interested in designing new solutions for solving concept drift problem in streaming data classification. It also highlights the open research questions in this field.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call