Abstract

Distributed data stream processing has become an increasingly popular computational framework due to many emerging applications which require real-time processing of data such as dynamic content delivery and security event analysis. These distributed data stream processing applications are often run on shared, multi-tenant clusters as companies try to consolidate from dedicated clusters for each application (batch and streaming) to a single cluster using a global cluster manager such as Hadoop YARN. In shared cluster environments, guaranteeing the quality of service constraints for throughput and response time for both stream processing applications and batch applications is a significant challenge. Stream processing applications often face an elastic demand where the input rate can vary drastically. The typical solution to solve workload elasticity is to guarantee enough resources to the application, but this solution is not possible when resources are being shared among multiple applications. In this paper, we present an approach for supporting elastic scaling of distributed data stream processing applications and efficiently scheduling and coordinating stream processing with batch processing in shared clusters. Our solution consists of a congestion detection monitor which detects bottlenecks in the streaming system and a global state manager that performs non-disruptive, stateful scaling of streaming applications. We implemented our solution using Storm, a popular stream processing framework, and tested our implementation on a Hadoop YARN cluster using a real-time security event processing workload. Our experimental results show that our solution improves stream processing application throughput by 49% over default Storm while decreasing average request response times by 58%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call