Abstract

High performance stream aggregation is critical for many emerging applications that analyze massive volumes of data. Incoming data needs to be stored in a sliding window during processing, in case the aggregation functions cannot be computed incrementally. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. Although window updates can be supported efficiently using multi-level queues, frequent window aggregations remain a performance bottleneck as they put tremendous pressure on the memory bandwidth and capacity. This article addresses this problem by enhancing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In addition, StreamZip incorporates a caching mechanism for dealing with skewed-key distributions in the incoming data stream. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to integers as well as floating-point numbers. Compared to designs without compression, StreamZip lossless and lossy designs achieve up to 7.5× and 22× higher throughput, while improving the effective memory capacity by up to 5× and 23×, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call