Abstract

The paper discusses the shift in the computing paradigm and the programming model for Big Data problems and applications. We compare DataFlow and ControlFlow programming models through their quantity and quality aspects. Big Data problems and applications that are suitable for implementation on DataFlow computers should not be measured using the same measures as ControlFlow computers. We propose a new methodology for benchmarking, which takes into account not only the execution time, but also the power and space, needed to complete the task. Recent research shows that if the TOP500 ranking was based on the new performance measures, DataFlow machines would outperform ControlFlow machines. To support the above claims, we present eight recent implementations of various algorithms using the DataFlow paradigm, which show considerable speed-ups, power reductions and space savings over their implementation using the ControlFlow paradigm.

Highlights

  • Big Data is becoming a reality in more and more research areas every year

  • Concrete measurement data from real applications in geophysics [1,2], financial engineering [3], and some other research fields [8,9,10,11,12], shows that a DataFlow machine rates better than a ControlFlow machine, if a different benchmark is used, as well as a different ranking methodology

  • In this paper we argue that the best methodology for TOP500 benchmarking should be based on the holistic performance measure H (TBigData, N1U) defined as the number of 1U boxes (N1U = one rack units or equivalent) needed to accomplish the desired execution time using a given Big Data benchmark

Read more

Summary

Introduction

Big Data is becoming a reality in more and more research areas every year. Big Data applications are becoming more visible as they are slowly entering areas concerning the general public. Trifunovic et al Journal of Big Data (2015) 2:4 of magnitude depends on the amount of data reusability within the loops This feature is enabled by compiling down to levels much below the machine code, which brings important additional effects: much lower execution time, equipment size, and power dissipation. Concrete measurement data from real applications in geophysics [1,2], financial engineering [3], and some other research fields [8,9,10,11,12], shows that a DataFlow machine (for example, the Maxeler MAX series) rates better than a ControlFlow machine (for example, Cray Titan), if a different benchmark is used (e.g., a Big Data benchmark), as well as a different ranking methodology (e.g., the benchmark execution time multiplied by the number of 1U boxes needed to accomplish the given execution time - 1U box represents one rack unit or equivalent - it is assumed, no matter what technology is inside, the 1U box always has the same size and always uses the same power)

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call