Serving More GPU Jobs, with Low Penalty, Using Remote GPU Execution and Migration

Pak Markthub,Satoshi Matsuoka,Akihiro Nomura

doi:10.1109/cluster.2016.36

Abstract

Remote GPU execution has been proven to increase GPU occupancy and reduce job waiting time in multi-GPU batch-queue systems, by allowing jobs to utilize remote GPUs when there are not enough unoccupied local GPUs available. However, for GPU communication intensive applications, remote GPU communication overhead may account for more than 70% of the applications' execution times. The need for using a remote GPU exists when there are not enough local GPUs available on a node assigned to the job, but a local GPU could become available afterward. We propose mrCUDA, a middleware for migrating execution on a remote GPU to a local GPU on-demand. Our evaluation shows that for long-running jobs mrCUDA overhead accounts for less than 1% of their total execution times. In addition, by applying mrCUDA to the first-come-first-serve (FCFS) job scheduling algorithm, we could reduce job lifetimes (waiting + execution times) as much as 30% on average without changing the scheduling policy.

Full Text