Abstract

Big data frameworks such as Apache Spark is becoming prominent to perform large-scale data analytics jobs in various domains. However, due to limited resource availability, the local or on-premise computing resources are often not sufficient to run these jobs. Therefore, public cloud resources can be hired on a pay-per-use basis from the cloud service providers to deploy a Spark cluster entirely on the cloud. Nevertheless, using only cloud resources can be costly. Hence, both local and cloud resources nowadays are used together to deploy a hybrid cloud computing cluster. However, scheduling jobs in a cluster deployed on hybrid clouds is challenging in the presence of various Service-Level Agreement (SLA) demands such as cost minimization and job deadline guarantee. Most of the existing works either consider a public or a locally deployed cluster and mainly focus on improving job performance in the cluster. In this article, we propose efficient scheduling algorithms that leverage from different VM instance pricing in a hybrid cloud deployed cluster to optimize the Virtual Machine (VM) usage cost for both local and cloud resources and maximize the job deadline met percentage. We have conducted extensive simulation-based experiments to compare our proposed algorithms with the baseline approaches. In addition, we have developed a prototype system on top of Apache Mesos cluster manager and performed real experiments to evaluate the applicability of our proposed approaches in a real platform with benchmark applications. The results show that our proposed algorithms are highly scalable and reduce the cost of VM usage of a hybrid cluster for up to 20 percent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call