Abstract

Cloud Computing has emerged as a powerful and promising way for running high performance computing (HPC) jobs. Most HPC jobs are designed under multi-processes paradigm and involve frequent communication and synchronization among parallel processes. However, as the underlying resources of cloud data centers are always shared among multiple tenants, the competition of jobs for limited bandwidth resources lead to unpredictable completion times for jobs in the cloud, which may lead to QoS violation and inefficient utilization of resources when scheduling parallel jobs in the cloud. To tackle the issue, it is essential to provide bandwidth guarantees for parallel jobs running in the cloud. Offering a dedicated virtual cluster (VC) for running applications in the cloud is a popular way to guarantee bandwidth demands. Motivated by these problems, in this paper, we firstly design a time-aware virtual cluster (TVC) request model for parallel jobs and consider how to embed requested TVCs of jobs into cloud efficiently under parallel job scheduling framework. An adaptive bandwidth-aware heuristic algorithm, which is denoted as AdaBa, is proposed to improve the job accept rate by adjusting the priorities of servers to accommodate the VMs of TVC adaptively according to the relative size of requested bandwidth demand. Then, a bandwidth-guaranteed migration and backfilling scheduling algorithm, which is denoted as BgMBF, is designed to schedule parallel jobs and the bandwidth demands are guaranteed by AdaBa. To obtain high job responsiveness performance, a bandwidth-reserved job backfilling strategy is designed when the requested TVC for current scheduled job cannot be allocated in the cloud. The migration cost of BgMBF is also considered and an enhanced version BgMBFSDF is then proposed to minimize the number of migration when the execution time of jobs are known. Through extensive simulation experiments on popular parallel workloads, our proposed TVC embedding algorithm AdaBa achieves up to 15 percent of improvement on accept rate compared with existing algorithms such as Oktupus and greedy algorithm. Our proposed BgMBF and BgMBFSDF also significantly outperform other popular scheduling algorithms integrated with AdaBa on average response time and average bounded slow down.

Highlights

  • Based on virtualization, data management techniques, etc., cloud computing paradigm delivers cost-effective and powerful Infrastructure as a Service (IaaS), and flexible and customizedPlatform as a Service (PaaS) and Software as a Service (SaaS), which allow agile customization to specific applications, software, and programming environment needs of users

  • When a time-aware virtual cluster (TVC) is accepted by the cloud, Ni VMs are deployed onto the idle slots of the servers in the cloud data center, and the residual bandwidth capabilities of links along paths routing to the corresponding servers are reduced

  • The management and scheduling of parallel jobs in the cloud can be treated as a variant of job scheduling problem that integrates with TVC embedding

Read more

Summary

Introduction

Data management techniques, etc., cloud computing paradigm delivers cost-effective and powerful Infrastructure as a Service (IaaS), and flexible and customized. Proposed a time-aware virtual cluster request model which can be used to specify an estimated required time-duration for jobs, and designed several online heuristic algorithms to allocate resources for scheduled requests. To improve the accept rate of jobs, the scheduling algorithm tries to pick the most suitable requested jobs to execute in the cloud according to the bandwidth and time duration profiles. Inspired by Dalvandi et al [14], we design a time-aware virtual cluster (TVC) request model to specific the resource demands of parallel jobs. The contributions of this paper are shown as follows: We propose an efficient adaptive bandwidth-aware virtual cluster embedding algorithm to allocate requested resources of virtual cluster for scheduled parallel jobs running in the cloud, which excavates more bandwidth resources on the links through adaptive communication hidden strategy to improve the accept rate for following arriving jobs.

Bandwidth Allocation in the Cloud
Parallel Job Scheduling
Data Center Network Model
Virtual Cluster Model
Problem Definition
Constrains
Proposed Algorithms
Simulation Settings
Workload
Performance of TVC Embedding
Scheduling Performance
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call