Abstract

Since distributed stream analytics is treated as a kind of cloud service, there exists a pressing need for its reliability and fault-tolerance, to guarantee the streaming data tuples to be processed in the order of their generation in every dataflow path, with each tuple processed once and only once. Currently there exist two kind approaches: one treats the whole process as a single transaction, and therefore suffers from the loss of intermediate results during failures; the other relies on the receipt of acknowledgement (ACK) to decide whether moving forward to emit the next resulting tuple or resending the current one after timeout, on the per-tuple basis, thus incurs extremely high latency penalty. In contradistinction to the above, we propose the backtrack mechanism for failure recovery, which allows a task to process tuples continuously without waiting for ACKs and without resending tuples in the failure-free case, but to request (ASK) the source tasks to resend the missing tuples only when it is restored from a failure which is a rare case thus has limited impact on the overall performance. The specific hard problem for building a transaction layer on-top of an existing stream processing platform consists in how to keep track the physical input/output messaging channels in order to realize re-messaging during failure recovery. Our solution is characterized by tracking physical messaging channels logically, for that we introduce the notions of virtual channel, task alias and messageId-set in reasoning, recording and communicating the channel information. We also provide a designated messaging channel, separated from the regular dataflow channel, for signaling ACK/ASK messages and for resending tuples, in order to avoid interrupting the regular order of data transfer. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we developed on top of Storm. As a principle, we ensure all the transactional properties to be system supported and transparent to users. Our experience shows the novelty and efficiency of the proposed mechanisms.

Highlights

  • 1.1 Transactional Stream ProcessingStream analytics as a cloud service has become a new trend in supporting mission-critical, continuous dataflow applications

  • Our experience shows that the novel virtual channel mechanism allows us to handle failure recovery correctly in elastic streaming processes, and the ASK-based recovery mechanism significantly outperforms the ACKbased one

  • In this work we have taken an initial step to advance the state of art of transactional stream processing with the task-oriented, fine-grinned and backtrack-based failure recovery mechanisms

Read more

Summary

Introduction

Stream analytics as a cloud service has become a new trend in supporting mission-critical, continuous dataflow applications. This has given rise to the reliability and fault-tolerance of distributed stream processing. A stream processing operation is a continuous operation applied to the input stream tuple by tuple, deriving a new output stream. In a distributed stream processing infrastructure, physically a logical operation may have multiple instances running in parallel, called tasks. A task runs cycle by cycle for transforming a stream to a new stream, where in each cycle it processes an input tuple, updates the execution state and emits the resulting tuples, carried in messages, to its target tasks

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.