Abstract

Hadoop is a widely-used software platform for the development, deployment and execution of Big Data applications. Leading technology companies such as Yahoo and Facebook are regularly employing Hadoop to process large datasets. Nevertheless, running Hadoop applications with effective performance-cost trade-offs is very challenging due to the large number of Hadoop parameters that need to be appropriately configured. The challenge is compounded by the frequent practice of deploying Hadoop applications on public cloud infrastructure, as this also requires the selection of suitable cloud configuration parameters (e.g., types and number of virtual machines) for each application. To address this challenge, our work-in-progress paper proposes an approach for the multi-objective optimisation of the Hadoop and cloud parameters of Hadoop 2.x applications deployed on public clouds. Our approach uses Hadoop and cloud infrastructure models to synthesise sets of configurations that achieve Pareto-optimal trade-offs between the execution time and the cost of Big Data applications, enabling users to select optimal deployments that meet their time and/or budget constraints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call