Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Umesh Kokate,Parikshit Mahalle,Pramod Patil,Arvind Deshpande

doi:10.3390/bdcc2040032

Abstract

Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds.

Highlights

Nowadays automation is in almost every domain and transactions of everyday life are recorded at high speed
The statistics for individual cluster, such as centroid x0, radius R, and diameter D is used recursively with multiphase clustering technique. These phases are: Phase 1: BRICH generates multilevel Clustering Feature (CF)-tree by preventing data’s inherent structure, consists of compress data during initial scan Phase 2: Clustering algorithm is applied staring from leaf modes of the CF-tree, this will remove sparse clusters as noise or outliers and dense nodes are grouped into clusters
The basic definitions in DBSCAN are introduced in the following, where D is a current set of data points: Basic Definition

Summary

Introduction

Nowadays automation is in almost every domain and transactions of everyday life are recorded at high speed. Some of the review papers discuss density-based clustering techniques on data streams [20]. A survey in a past paper [21] discussed a review on density-based clustering techniques and methods for evolving data-streams. The authors have surveyed clustering algorithms used in different domains and their applications on benchmark datasets and computational problems. They discussed many closely correlated topics such as cluster validation and proximity measures. Their focus is on clustering techniques based on MapReduce and parallel classification using MapReduce The authors of another past paper [22] used taxonomy and empirical analysis to survey clustering algorithms on big data.

Clustering Techniques

Partitional Clustering

Hierarchical Clustering

Density-Based

Grid-Based

Model-Based

Hierarchical Clustering Method

Density-Based Clustering Method

Grid-Based Clustering Method

Model-Based Clustering Methods

Evaluation Clustering Methods

Determining Number of Clusters in a Data Set

Measuring Clustering Quality

The Internal Measures for Evaluation of Clustering Quality

F Measure

The External Measure for Evaluation of Clustering Quality

Challenging Issues and Comparison

Experimentation with Data Streams

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data and Cognitive Computing	Publication Date: Oct 17, 2018
Citations: 44	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data and Cognitive Computing

Lead the way for us

Similar Papers

A survey on data stream clustering and classification
Hai-Long Nguyen ... Wee-Keong Ng
Knowledge and Information Systems | VOL. 45
Hai-Long Nguyen, et. al.Hai-Long Nguyen ... Wee-Keong Ng
17 Dec 2014
Knowledge and Information Systems | VOL. 45

Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
01 Jan 2004
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

A Stable and Online Approach to Detect Concept Drift in Data Streams
Fausto Guzzo Da Costa ... Rodrigo Fernandes De Mello
-
Fausto Guzzo Da Costa, et. al.Fausto Guzzo Da Costa ... Rodrigo Fernandes De Mello
01 Oct 2014
01 Oct 2014

Processing complex aggregate queries over data streams
Alin Dobra ... Minos Garofalakis
-
Alin Dobra, et. al.Alin Dobra ... Minos Garofalakis
03 Jun 2002
03 Jun 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data and Cognitive Computing