Abstract

Dedicated infrastructures are commonly used for urgent computations. However, using dedicated resources is not always affordable due to budget constraints. As a result, utilizing shared infrastructures becomes an alternative solution for urgent computations. Since the infrastructures are meant to serve many users, the urgent jobs may arrive when regular jobs are using the necessary resources. In such a case, it is necessary to preempt the regular jobs so that urgent jobs can be executed immediately. Most conventional methods for job scheduling have focused on reducing the response times and waiting times of all jobs. However, these methods can delay urgent jobs and hinder them from being completed within a stipulated deadline. Furthermore, in heterogeneous systems with coprocessors, preemption becomes more difficult because coprocessors rely on several system software functionalities provided by the host processor. In this paper, we propose a parallel job scheduling method to effectively use shared heterogeneous systems for urgent computations. Our method employs an in-memory process swapping mechanism to preempt jobs running on the coprocessor devices. The results of our simulations show that our method can achieve a significant reduction in the response time and slowdown of regular jobs without substantial delays of urgent jobs.

Highlights

  • Large-scale scientific computing has played an important role in critical decision-making systems

  • PARALLEL JOB SCHEDULING METRICS This paper considers the workload as the data regarding each job submitted to a HPC center during a certain time period

  • PREEMPTIVE JOB SCHEDULING FOR URGENT COMPUTING we describe the proposed job scheduling method that addresses the problems in enabling shared heterogeneous systems for Urgent computing (UC)

Read more

Summary

INTRODUCTION

Large-scale scientific computing has played an important role in critical decision-making systems. In shared infrastructures supporting UC, it is important to prevent delays of urgent jobs while decreasing the response time of regular jobs. A CPU hosts other kind of processors, such as the vector processors considered in this paper For this kind of system, resource providers need to handle the preemption of jobs running on coprocessor devices. The evaluation results show that it can achieve almost no slowdown of urgent jobs and a response time and slowdown of regular jobs comparable to those of existing backfilling-based job scheduling algorithms These results demonstrate the importance of the proposed method in enabling shared infrastructures for supporting UC. To the best of our knowledge, this work is the first to consider a new job scheduling method and a preemption mechanism for shared heterogeneous systems supporting UC.

RELATED WORK
A MOTIVATING EXAMPLE WITH TSUNAMI SIMULATIONS
PREEMPTIVE JOB SCHEDULING FOR URGENT COMPUTING
THE URGENT LATENESS METRIC
Findings
EXPERIMENTAL EVALUATION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call