Performance Analysis of Large-Scale Distributed Stream Processing Systems on the Cloud

Tri Minh Truong,Richard O Sinnott,Aaron Harwood,Shiping Chen

doi:10.1109/cloud.2018.00103

Tri Minh Truong, Richard O Sinnott + Show 2 more

https://doi.org/10.1109/cloud.2018.00103

Copy DOI

Export

Save

Cite

Publication Date: Jul 1, 2018

Citations: 8

Affiliation: University of Melbourne

Abstract
Full-Text
Similar Papers

Abstract

Listen

Real-time data processing is often a necessity as it can provide insights that have less value if discovered off-line or after the fact. However, large-scale stream processing systems are non-trivial to build and deploy. While there are many frameworks that allow users to create large-scale distributed systems, there remains many challenges in understanding the performance, cost of deployment and considerations and impact of potential (partial) outages on real-time systems performance. Our work considers the performance of Cloud-based stream processing systems in terms of back-pressure and expected utilization. The performance of an exemplar stream application is explored using different Cloud-based virtual machine resources and where the scale of deployment and cost benefits are taken into consideration in relation to the overall performance. To achieve this, we develop an algorithm based on queueing theory to predict the throughput and latency of stream data processing while supporting system stability. Our methodology for making fundamental measurements is applicable to mainstream stream processing frameworks such as Apache Storm and Heron. The method is especially suitable for large-scale distributed stream processing where jobs can run for extended time periods. We benchmark the performance of the system on the national research cloud of Australia (Nectar), and present a performance analysis based on estimating the overall effective utilization.

Full Text