Abstract

SummaryDistributed Stream Processing Systems (DSPS) are very popular to process unbounded data streams in real‐time. Low processing latency is a fundamental requirement for DSPS applications to maintain the real‐time response. This requirement of low processing latency for DSPS is badly affected due to inevitable failures in computing systems. Generally, DSPS grapple with these inevitable failures by triggering periodic checkpoints. The periodic checkpoints pessimistically persist the application state so that the execution may be resumed after the failure. These periodic checkpoints incur high overheads due to the high frequency of checkpoints triggering, which increases the overall execution time. On the other hand, failure occurrences in real‐world systems are not periodic. This sharp contrast between the periodic checkpoints and failure distributions in the real‐world systems makes the periodic checkpoints inefficient. We propose a failure‐aware adaptive fault tolerance model called FATM which triggers the checkpoints inline with the underlying failure rate. Further, we design a model for utility factor and checkpoint overheads to evaluate the performance of fault tolerance models for DSPS. We implement the FATM atop Apache Flink and perform a series of experiments. To validate the effectiveness of FATM, experiment results are compared with the existing checkpoint‐based models of DSPS. The results show that the FATM significantly reduces the checkpoint frequency, increases the utility factor, and reduces the checkpoint overheads by 28%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call