Abstract

Research and development of deep learning (DL) applications often involves exhaustive trial-and-error, which demands that shared computational resources, especially GPUs, be efficiently allocated. Most DL tasks are moldable or malleable (i.e., the number of allocated GPUs can be changed before or during execution). However, conventional batch schedulers do not take advantage of DL tasks' moldability/malleability, inhibiting speedup when some GPU resources are unallocated. Another opportunity for speedup is to run multiple tasks concurrently on one GPU, which may improve the overall throughput because a single task does not always fully utilize the GPU's computational resources. We propose designing a batch scheduling system that exploits these opportunities to accelerate DL tasks. As a first step, this study conducts an extensive case study to evaluate the speedup of DL tasks when a scheduler treats them as moldable or malleable. That is, the scheduler adjusts the number of GPUs to be (or already) allocated to a task in response to the fluctuating availability of GPUs. Simulations using our real workload trace show that if the scheduler can allocate 1–4 GPUs to a task or assign 1–4 tasks to a GPU, then the average flow time of moldable/malleable DL tasks is shortened by at least 15.1 %/42.5 %, respectively, compared to a Rigid FCFS schedule in which one GPU is allocated to each task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call