Safe Data Parallelism for General Streaming

Scott Schneider,Buğra Gedik,Kun-Lung Wu,Martin Hirzel

doi:10.1109/tc.2013.221

Abstract

Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Safe Data Parallelism for General Streaming

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Feb 1, 2015
Citations: 62

Similar Papers

Auto-parallelizing stateful distributed streaming applications
Scott Schneider ... Martin Hirzel
-
Scott Schneider, et. al.Scott Schneider ... Martin Hirzel
19 Sep 2012
19 Sep 2012

Data parallelism for distributed streaming applications
Bhagyashali Shinde ... S T Singh
-
Bhagyashali Shinde, et. al.Bhagyashali Shinde ... S T Singh
01 Aug 2016
01 Aug 2016

Application of an object-oriented parallel run-time system to a Grand Challenge 3D multi-grid code
C Baillie ... S Vajracharya
-
C Baillie, et. al.C Baillie ... S Vajracharya
01 Jan 1996
01 Jan 1996

Braid: integrating task and data parallelism
E.A West ... A.S Grimshaw
-
E.A West, et. al.E.A West ... A.S Grimshaw
06 Feb 1995
06 Feb 1995

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Safe Data Parallelism for General Streaming

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers