Victream

Jun Suzuki,Takashi Takenaka,Masaru Kitsuregawa,Yuki Hayashi,Masaki Kan,Shinya Miyakawa,Takuya Araki

doi:10.1145/3148055.3148059

Abstract

In data-parallel computing that uses a graphic processing unit (GPU), processing of large data requires that multiple GPUs be used in the computer to increase its execution performance. Increasing processing performance by using multiple computing resources has been enabled by the development of computing frameworks based on a directed acyclic graph (DAG). However, their performance degrades in out-of-core processing, which often occurs in processing of large data on GPUs with limited memory capacity. The GPU data input/output (I/O) for data swapping between host memory and GPU memory during the execution of a user DAG is usually a performance bottleneck. A computing framework called is proposed to overcome this drawback. It uses a novel scheduler that involves two methods to minimize the total amount of GPU data I/O of data swapping. First, it performs locality-aware scheduling. When it schedules a task, it selects one that requires the minimum amount of data swapping and reuses as much of the data residing in GPU memory as possible. Second, it extends the locality-aware scheduling so that GPUs can execute data prefetching. Prefetching data that are swapped out from a GPU enables efficient use of bottleneck GPU I/O resources. To prefetch the input data of future tasks, it is required to determine the schedule of future tasks. Victream's scheduler (hereafter, the Victream scheduler) extends the locality-aware scheduling so that it can schedule future tasks to enable data prefetching that is executed in the way that minimizes the amount of data I/O of data swapping. Evaluation of a Victream prototype showed that the performance of Victream is better than that of conventional frameworks by up to 117%.

Full Text