Toward Efficient and Simplified Distributed Data Intensive Computing

Yunhong Gu Yunhong Gu,R Grossman

doi:10.1109/tpds.2011.67

Abstract

While the capability of computing systems has been increasing at Moore's Law, the amount of digital data has been increasing even faster. There is a growing need for systems that can manage and analyze very large data sets, preferably on shared-nothing commodity systems due to their low expense. In this paper, we describe the design and implementation of a distributed file system called Sector and an associated programming framework called Sphere that processes the data managed by Sector in parallel. Sphere is designed so that the processing of data can be done in place over the data whenever possible. Sometimes, this is called data locality. We describe the directives Sphere supports to improve data locality. In our experimental studies, the Sector/Sphere system has consistently performed about 2-4 times faster than Hadoop, the most popular system for processing very large data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward Efficient and Simplified Distributed Data Intensive Computing

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jun 1, 2011
Citations: 39

Similar Papers

Research on Fuzzy Clustering Algorithms for Large Dimensional Data Sets Under Cloud Computing
Shuang-Cheng Jia ...
-
Shuang-Cheng Jia, et. al.Shuang-Cheng Jia ...
01 Jan 2020
01 Jan 2020

Machine learning for Big Data analytics in plants.
Chuang Ma ... Xiangfeng Wang
Trends in Plant Science | VOL. 19
Chuang Ma, et. al.Chuang Ma ... Xiangfeng Wang
14 Sep 2014
Trends in Plant Science | VOL. 19

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Collection, processing, interpretation and modelling of digital outcrop data using VRGS: An integrated approach to outcrop modelling
D Hodgetts
-
D HodgettsD Hodgetts
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward Efficient and Simplified Distributed Data Intensive Computing

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems