Targeting a light-weight and multi-channel approach for distributed stream processing

Vinu Ellampallil Venugopal,Martin Theobald,Damien Tassetti,Samira Chaychi,Amal Tawakuli

doi:10.1016/j.jpdc.2022.04.022

Abstract

Processing high-throughput data-streams has become a major challenge in areas such as real-time event monitoring, complex dataflow processing, and big data analytics. While there has been tremendous progress in distributed stream processing systems in the past few years, the high-throughput and low-latency (a.k.a. high sustainable-throughput) requirement of modern applications is pushing the limits of traditional data processing infrastructures. This paper introduces a new distributed stream processing engine (DSPE), called Asynchronous Iterative Routing (or simply “AIR”), which implements a light-weight, dynamic sharding protocol. AIR expedites direct and asynchronous communication among all the worker nodes via a channel-like communication protocol on top of the Message Passing Interface (MPI), thereby completely avoiding the need for a dedicated driver node. The system adopts a new progress-tracking protocol, called hew-meld, which has been experimentally observed to show a low processing latency on our asynchronous master-less architecture when compared to the conventional low-watermark technique. The current version of AIR is also equipped with two fault tolerance and recovery strategies namely checkpointing & rollback and replication. With its unique design, AIR scales out particularly well to multi-core HPC architectures; specifically, we deployed it on clusters with up to 16 nodes and 448 cores (thus reaching a peak of 435.3 million events and 55.14 GB of data processed per second), which we found to significantly outperform existing DSPEs.

Full Text