Abstract

As cloud applications are becoming increasingly deadline-sensitive, meeting desired deadlines is more critical, especially in shared clusters. It has been shown that a few slow tasks, called stragglers, could significantly adversely impact job execution times. Moreover, poor scheduling of data analytics applications can lead to inefficient resource usage, and eventually hurt system performance. One way to mitigate stragglers is by launching extra attempts (clones) for each task upon job submission. In this paper, we propose Shed, an optimization framework that leverages dynamic cloning to jointly maximize jobs' Probability of Completion before Deadline (PoCD) by fully utilizing the available resources. Our work includes a novel online scheduler that dynamically recomputes and reallocates resources during a job's execution for PoCD maximization. The results show that Shed is able to leverage cloud resources and maximize the percentage of jobs that meet their deadlines - up to 100% in our experiments compared to typically around 60% and 40% for another cloning approach called Dolly, and Hadoop with speculation enabled, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call