CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

Ankur Dave,Roger Barga,Jared Jackson,Wei Lu

doi:10.1109/ipdps.2011.258

Abstract

As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially those addressing iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms. In this paper we describe our implementation of Cloud Clustering, a distributed k-means clustering algorithm on Microsoft's Windows Azure cloud. The k-means algorithm makes a good case study because its characteristics are representative of many iterative data analysis algorithms. Cloud Clustering adopts a novel architecture to improve performance without sacrificing fault tolerance. To achieve this goal, we introduce a distributed fault tolerance mechanism called the buddy system, and we make use of data affinity and check pointing. Our goal is to generalize this architecture into a pattern for large-scale iterative data analysis on the cloud.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Iterative data analysis for sensing applications
Ella Peltonen
-
Ella PeltonenElla Peltonen
01 Mar 2015
01 Mar 2015

Hybrid methods for cybersecurity analysis :
Daniel Dunlavy
-
Daniel DunlavyDaniel Dunlavy
01 Jan 2014
01 Jan 2014

A Reference Architecture for High-Availability Automatic Failover between PaaS Cloud Providers
Ivor D Addo ... Sheikh I Ahamed
-
Ivor D Addo, et. al.Ivor D Addo ... Sheikh I Ahamed
01 Jun 2014
01 Jun 2014

Determination of accuracy and probability in the analysis of large-scale biomedical data
Stella Vetova
-
Stella VetovaStella Vetova
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

Abstract

Talk to us

Similar Papers