Data-Trace Types for Distributed Stream Processing Systems.

Konstantinos Mamouras,Val Tannen,Caleb Stanford,Zachary G Ives,Rajeev Alur

doi:10.1145/3314221.3314580

Abstract

Distributed architectures for efficient processing of streaming data are increasingly critical to modern information processing systems. The goal of this paper is to develop type-based programming abstractions that facilitate correct and efficient deployment of a logical specification of the desired computation on such architectures. In the proposed model, each communication link has an associated type specifying tagged data items along with a dependency relation over tags that captures the logical partial ordering constraints over data items. The semantics of a (distributed) stream processing system is then a function from input data traces to output data traces, where a data trace is an equivalence class of sequences of data items induced by the dependency relation. This data-trace transduction model generalizes both acyclic synchronous data-flow and relational query processors, and can specify computations over data streams with a rich variety of partial ordering and synchronization characteristics. We then describe a set of programming templates for data-trace transductions: abstractions corresponding to common stream processing tasks. Our system automatically maps these high-level programs to a given topology on the distributed implementation platform Apache Storm while preserving the semantics. Our experimental evaluation shows that (1) while automatic parallelization deployed by existing systems may not preserve semantics, particularly when the computation is sensitive to the ordering of data items, our programming abstractions allow a natural specification of the query that contains a mix of ordering constraints while guaranteeing correct deployment, and (2) the throughput of the automatically compiled distributed code is comparable to that of hand-crafted distributed implementations.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data-Trace Types for Distributed Stream Processing Systems.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM SIGPLAN ... Conference on Programming Language Design and Implementation. ACM SIGPLAN Conference on Programming Language Design and Implementation

Lead the way for us

Journal: Proceedings of the ACM SIGPLAN ... Conference on Programming Language Design and Implementation. ACM SIGPLAN Conference on Programming Language Design and Implementation	Publication Date: Jun 8, 2019
Citations: 14

Similar Papers

Interfaces for Stream Processing Systems
Konstantinos Mamouras ... Caleb Stanford
-
Konstantinos Mamouras, et. al.Konstantinos Mamouras ... Caleb Stanford
01 Jan 2018
01 Jan 2018

When FPGA-Accelerator Meets Stream Data Processing in the Edge
Jiang Xiao ... Song Wu
-
Jiang Xiao, et. al.Jiang Xiao ... Song Wu
01 Jul 2019
01 Jul 2019

On Data Stream Processing in IoT Applications
Manfred Sneps-Sneppe ... Dmitry Namiot
-
Manfred Sneps-Sneppe, et. al.Manfred Sneps-Sneppe ... Dmitry Namiot
01 Jan 2018
01 Jan 2018

A Novel Algorithm for Predicting Valuable Items in Data Streams
S Vijayarani Mohan
Journal of Applied Information Science | VOL. 3
S Vijayarani MohanS Vijayarani Mohan
01 Jan 2015
Journal of Applied Information Science | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-Trace Types for Distributed Stream Processing Systems.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM SIGPLAN ... Conference on Programming Language Design and Implementation. ACM SIGPLAN Conference on Programming Language Design and Implementation