Abstract

The era of personal resources being sufficient for enterprise big data computations has passed. As computations are executed in the cloud, small policy changes of cloud operators may cause considerable changes in operational costs. Carefully choosing the amount of resources for a given application is thus of great importance. This, however, requires a priori knowledge of the application's performance under different configurations. Creating a performance prediction model needs to account for the heterogeneity of resources and the diversity in application workflows. Previous approaches for heterogeneous environments consider a black-box representation of the application which results in single-purpose models. This paper addresses the problem with two gray-box prediction models using linear programming (LP) and mixed-integer linear programming (MILP). Given a set of available resources, the models consider Apache Spark applications and their Directed Acyclic Graph (DAG) of workflow running on top of a Hadoop-YARN cluster. We then propose a configuration recommendation algorithm to optimize the cost-performance trade-offs when renting machine instances. The accuracy of the proposed models is evaluated with real-world executions of several representative applications on the Wikipedia dataset and the TPC-DS benchmark. The average error of only 3.28% for the proposed prediction models demonstrates the practicality of the proposed approach in handling cost-performance trade-offs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.