AdaHash: hashing-based scalable, adaptive hierarchical clustering of streaming data on Mapreduce frameworks

Dean Teffer,Ravi Srinivasan,Joydeep Ghosh

doi:10.1007/s41060-018-0145-7

Abstract

Despite the recent growth in large-scale, distributed streaming data processing, there are currently limited options for flexible clustering of streaming data on industrial production frameworks such as Mapreduce and Spark. The clustering methods being used by practitioners on such systems do not respond rapidly to new data, or do not adjust the number of clusters appropriately as more data are processed. This problem is particularly acute for unstructured data like text and other non-enumerated types that are common in log and message streams and not analyzed at scale for precisely this reason. To address such issues, this paper proposes a method for hierarchical clustering using adaptive hash (AdaHash) values that can be re-calculated during a periodic batch process and used for subsequent streaming processing at the speed of data arrival, assuming sufficient distributed compute resources. We demonstrate that this method is as fast as other (optimal) hashing methods while enabling an adaptive hash function on each batch cycle within a lambda architecture computing framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AdaHash: hashing-based scalable, adaptive hierarchical clustering of streaming data on Mapreduce frameworks

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics

Lead the way for us

Journal: International Journal of Data Science and Analytics	Publication Date: Aug 1, 2018
Citations: 8

Similar Papers

CC_TRS: Continuous Clustering of Trajectory Stream Data Based on Micro Cluster Life
Musaab Riyadh ... Nurfadhlina Binti Mohd Sharef
Mathematical Problems in Engineering | VOL. 2017
Musaab Riyadh, et. al.Musaab Riyadh ... Nurfadhlina Binti Mohd Sharef
01 Jan 2017
Mathematical Problems in Engineering | VOL. 2017

Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
Felix Borutta ... Daniyal Kazempour
-
Felix Borutta, et. al.Felix Borutta ... Daniyal Kazempour
01 Jan 2020
01 Jan 2020

A Comparative Study of Density-based Clustering Algorithms on Data Streams: Micro-clustering Approaches
Amineh Amini ... Teh Ying Wah
-
Amineh Amini, et. al.Amineh Amini ... Teh Ying Wah
11 Dec 2011
11 Dec 2011

Hierarchical clustering for multiple nominal data streams with evolving behaviour
Jerry W Sangma ... Yogita
Complex & Intelligent Systems | VOL. 8
Jerry W Sangma, et. al.Jerry W Sangma ... Yogita
07 Jan 2022
Complex & Intelligent Systems | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AdaHash: hashing-based scalable, adaptive hierarchical clustering of streaming data on Mapreduce frameworks

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics