Abstract
We investigate the scheduling problem that arises in parallel applications executing on a network of machines by using a mode of cycle-stealing. In this mode of execution a parallel application executes its tasks in several machines whenever they are idle. When the user reclaims the machine, tasks must relinquish control immediately. In this case, the parallel application has the risk of losing work in progress on reclaimed machines and, therefore, the total execution time of the parallel application will be affected by the need for rescheduling the pre-empted task. We first evaluate the impact on the performance of an application when it runs on two different scenarios: a set of N dedicated machines, and a set of N non-dedicated machines (in which pre-emption may occur). This study shows that losing machines may have a considerable impact on the execution time of the application and therefore, we propose and evaluate three simple strategies to alleviate this problem. All strategies are based on the use of additional machines, but they differ in the way that these extra machines are used. In the first strategy additional machines are added to the common pool of machines used by the application. The other two are based on task replication, in which the additional machines are used to execute certain tasks that are already running in other machines.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.