Paradigm Shift in Big Data SuperComputing: DataFlow vs. ControlFlow

Nemanja Trifunovic,Jakob Salom,Anton Kos,Veljko Milutinovic

doi:10.1186/s40537-014-0010-z

Abstract

The paper discusses the shift in the computing paradigm and the programming model for Big Data problems and applications. We compare DataFlow and ControlFlow programming models through their quantity and quality aspects. Big Data problems and applications that are suitable for implementation on DataFlow computers should not be measured using the same measures as ControlFlow computers. We propose a new methodology for benchmarking, which takes into account not only the execution time, but also the power and space, needed to complete the task. Recent research shows that if the TOP500 ranking was based on the new performance measures, DataFlow machines would outperform ControlFlow machines. To support the above claims, we present eight recent implementations of various algorithms using the DataFlow paradigm, which show considerable speed-ups, power reductions and space savings over their implementation using the ControlFlow paradigm.

Highlights

Big Data is becoming a reality in more and more research areas every year
Concrete measurement data from real applications in geophysics [1,2], financial engineering [3], and some other research fields [8,9,10,11,12], shows that a DataFlow machine rates better than a ControlFlow machine, if a different benchmark is used, as well as a different ranking methodology
In this paper we argue that the best methodology for TOP500 benchmarking should be based on the holistic performance measure H (TBigData, N1U) defined as the number of 1U boxes (N1U = one rack units or equivalent) needed to accomplish the desired execution time using a given Big Data benchmark

Summary

Introduction

Big Data is becoming a reality in more and more research areas every year. Big Data applications are becoming more visible as they are slowly entering areas concerning the general public. Trifunovic et al Journal of Big Data (2015) 2:4 of magnitude depends on the amount of data reusability within the loops This feature is enabled by compiling down to levels much below the machine code, which brings important additional effects: much lower execution time, equipment size, and power dissipation. Concrete measurement data from real applications in geophysics [1,2], financial engineering [3], and some other research fields [8,9,10,11,12], shows that a DataFlow machine (for example, the Maxeler MAX series) rates better than a ControlFlow machine (for example, Cray Titan), if a different benchmark is used (e.g., a Big Data benchmark), as well as a different ranking methodology (e.g., the benchmark execution time multiplied by the number of 1U boxes needed to accomplish the given execution time - 1U box represents one rack unit or equivalent - it is assumed, no matter what technology is inside, the 1U box always has the same size and always uses the same power)

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: May 10, 2015
Citations: 45	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Paradigm Shift in Big Data SuperComputing: DataFlow vs. ControlFlow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Big Data Processing: Data Flow vs Control Flow (New Benchmarking Methodology)
Anton Kos ... Veljko Milutinovic
-
Anton Kos, et. al.Anton Kos ... Veljko Milutinovic
01 Oct 2014
01 Oct 2014

The DataFlow Paradigm
Veljko Milutinović ... Jakob Salom
-
Veljko Milutinović, et. al.Veljko Milutinović ... Jakob Salom
01 Jan 2015
01 Jan 2015

Big Data Technologies and Applications
...
-
, et. al. ...
01 Jan 2018
01 Jan 2018

The Competence of Volunteer Computing for MapReduce Big Data Applications
Wei Li ... William Guo
-
Wei Li, et. al.Wei Li ... William Guo
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Paradigm Shift in Big Data SuperComputing: DataFlow vs. ControlFlow

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data