Abstract

Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call