Stream Processing Engines Research Articles

Pervasive applications rely on increasingly complex streams of sensor data continuously captured from the physical world. Such data is crucial to enable applications to “understand” the current context and to infer the right actions to perform, be they fully automatic or involving some user decisions. However, the continuous nature of such streams, the relatively high throughput at which data is generated and the number of sensors usually deployed in the environment, make direct data handling practically unfeasible. Data not only needs to be cleaned, but it must also be filtered and aggregated to relieve higher level algorithms from near real-time handling of such massive data flows. We propose here a stream-processing framework (spChains), based upon state-of-the-art stream processing engines, which enables declarative and modular composition of stream processing chains built atop of a set of extensible stream processing blocks. While stream processing blocks are delivered as a standard, yet extensible, library of application-independent processing elements, chains can be defined by the pervasive application engineering team. We demonstrate the flexibility and effectiveness of the spChains framework on two real-world applications in the energy management and in the industrial plant management domains, by evaluating them on a prototype implementation based on the Esper stream processor.

Read full abstract

Over the past few years, Stream Processing Engines (SPEs) have emerged as a new class of software systems, enabling low latency processing of streams of data arriving at high rates. As SPEs mature and get used in monitoring applications that must continuously run (e.g., in network security monitoring), a significant challenge arises: SPEs must be able to handle various software and hardware faults that occur, masking them to provide high availability (HA). In this article, we develop, implement, and evaluate DPC (Delay, Process, and Correct), a protocol to handle crash failures of processing nodes and network failures in a distributed SPE. Like previous approaches to HA, DPC uses replication and masks many types of node and network failures. In the presence of network partitions, the designer of any replication system faces a choice between providing availability or data consistency across the replicas. In DPC, this choice is made explicit: the user specifies an availability bound (no result should be delayed by more than a specified delay threshold even under failure if the corresponding input is available), and DPC attempts to minimize the resulting inconsistency between replicas (not all of which might have seen the input data) while meeting the given delay threshold. Although conceptually simple, the DPC protocol tolerates the occurrence of multiple simultaneous failures as well as any further failures that occur during recovery. This article describes DPC and its implementation in the Borealis SPE. We show that DPC enables a distributed SPE to maintain low-latency processing at all times, while also achieving eventual consistency, where applications eventually receive the complete and correct output streams. Furthermore, we show that, independent of system size and failure location, it is possible to handle failures almost up-to the user-specified bound in a manner that meets the required availability without introducing any inconsistency.

Read full abstract

Stream Processing Engines Research Articles

Related Topics

Articles published on Stream Processing Engines

Modeling the execution semantics of stream processing engines with SECRET

StreamCloud: An Elastic and Scalable Data Streaming System

SpChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster

UpStream

Changing flights in mid-air

SECRET

Multi-scale neural texture classification using the GPU as a stream processing engine

Maintaining consistent results of continuous queries under diverse window specifications

Fault-tolerant stream processing using a distributed, replicated file system

Fault-tolerance in the borealis distributed stream processing system

Retrospective on Aurora

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Stream Processing Engines Research Articles

Related Topics

Articles published on Stream Processing Engines

Modeling the execution semantics of stream processing engines with SECRET

StreamCloud: An Elastic and Scalable Data Streaming System

SpChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster

UpStream

Changing flights in mid-air

SECRET

Multi-scale neural texture classification using the GPU as a stream processing engine

Maintaining consistent results of continuous queries under diverse window specifications

Fault-tolerant stream processing using a distributed, replicated file system

Fault-tolerance in the borealis distributed stream processing system

Retrospective on Aurora