Abstract

With the growing deadline-sensitivity of cloud applications, adherence to specific deadlines is becoming increasingly crucial, particularly in shared clusters. A few slow tasks called stragglers can potentially adversely affect job execution times. Equally, inadequate slotting of data analytics applications could result in inappropriate resource deployment, ultimately damaging system performance. Against this backdrop, one effective way of tackling stragglers is by making extra attempts (or clones)1 for every single straggler after the submission of a job. This paper proposes Shed+, which is an optimization framework utilizing dynamic speculation that aims to maximize the jobs' PoCD (Probability of Completion before Deadline) by making full use of available resources. Notably, our work encompasses a new online scheduler that dynamically recomputes and reallocates resources during the course of a job's execution. According to our findings, Shed+ successfully leverages cloud resources and maximizes the percentage of jobs meeting their deadlines. In our experiments, we have seen this percentage for heavy load going up to 98% for Shed+ as opposed to nearly 68%, 40%, 35% and 37% for Shed, Dolly, Hopper and Hadoop with speculation enabled, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call