Abstract

Stream processing is used in various fields. In the field of big data, stream aggregation is a popular processing technique, but it suffers serious setbacks when the order of events (e.g., stream elements) occurring is different from the order of events arriving to the systems. Such data streams are called “non-FIFO steams”. This phenomenon usually occurs in a distributed environment due to many factors, such as network disruptions, delays, etc. Many analyzing scenarios require efficient processing of such non-FIFO streams to meet various data processing requirements. This paper proposes an efficient scalable checkpoint-based bidirectional indexing approach, called <inline-formula><tex-math notation="LaTeX">$CPiX$</tex-math></inline-formula> , for faster real-time analysis over non-FIFO streams. CPiX maintains the partial aggregation results in an on-demand manner per checkpoint. CPiX needs less time and space than the state-of-the-art approach. Extensive experiments confirm that CPiX can deal with out-of-order streams very efficiently and is, on average, about 3.8 times faster than the state-of-the-art approach while consuming less memory.

Highlights

  • The rapid growth of modern technological devices and social network services increases the flow of unbounded record transmissions in real time

  • This paper focuses on time-based window because it is the most common in data stream processing

  • We propose a CPiX for incremental sliding-window aggregation over out-of-order data streams

Read more

Summary

Introduction

The rapid growth of modern technological devices and social network services increases the flow of unbounded record transmissions in real time. Due to the increased prevalence of data streams, many academic and industrial fields have investigated stream processing. One practical issue is how to handle an infinite sequence of streaming records [5], [6], [7], [8], [9], [15] using finite memory (or computational resources). Using a window to extract partial data from an infinite stream is one solution as a window limits the evaluation scope given an operation over a sequence of records. To apply the operation many times, the window is slid over the stream within a range. This practice is called a sliding window

Objectives
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call