Abstract
In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends.
Highlights
The data deluge generated by our ever-more-connected world raises the need of easy-to-use frameworks able to efficiently process data streams in real-time
Such frameworks should provide high-level user-friendly programming interfaces for easing the developing of efficient streaming applications. They should enable the efficient execution on modern hardware, limited to clusters as in traditional systems like Apache Storm [1] and Apache Flink [2], and on modern powerful scale-up servers equipped with tens of cores and terabytes of memory
In our prior work [5], we proposed a set of parallel patterns targeting continuous analytics based on sliding windows
Summary
The data deluge generated by our ever-more-connected world raises the need of easy-to-use frameworks able to efficiently process data streams in real-time Such frameworks should provide high-level user-friendly programming interfaces for easing the developing of efficient streaming applications. They should enable the efficient execution on modern hardware, limited to clusters as in traditional systems like Apache Storm [1] and Apache Flink [2], and on modern powerful scale-up servers equipped with tens of cores and terabytes of memory. In our prior work [5], we proposed a set of parallel patterns targeting continuous analytics based on sliding windows Such kind of queries are supported in the existing frameworks and represent an essential part of many streaming benchmarks [6].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.