Abstract

Stream processing software frameworks enable real-time processing of continuous unbounded streams of data at a high speed. Leveraging the elasticity of cloud computing infrastructure, stream processing frameworks can become Software as a Service for many domain applications that provide simplified development and run-time management. An issue of making such a SaaS scalable is to allocate data processing operators on nodes of clusters and balance the workload dynamically. Since the data volume and rate can be unpredictable, static mapping between operators and cluster resources often results in unbalanced operator load distribution. This paper proposes an optimization method that combines correlation of resource utilization of nodes and capacity of clusters. The associated software components form a layer between a stream processing software framework and cloud clusters and nodes. This software layer allows dynamic transferring of an operator to different cluster nodes at runtime and keeps transparent to developers. We present a prototype evaluation on Yahoo's S4 and clusters on Emulab.org. Our implementation is evaluated by a top-N topic list application on Twitter streams. The results demonstrate improved stream processing throughputs and cluster resource utilization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.