Abstract

The graphics processing unit (GPU) is extensively used in diverse domains, such as finance, machine learning, and image processing. The GPU can be underutilized as multiple applications may not share the same GPU concurrently owing to a memory oversubscription issue. For example, when applications that require fewer computational resources but a larger GPU memory are running instantaneously, the GPU memory may be insufficient; consequently, the number of GPU applications running simultaneously is restricted, decreasing GPU utilization. Further, it can even stop the execution of applications that are running on the GPU. To this end, we propose FlexGPU, which schedules the kernels of the GPU applications that run on the same GPU according to their features. This framework 1) schedules the kernel at the launching time according to its features to improve GPU utilization and 2) temporarily checkpoints and restores non-dependent content in the GPU memory to/from the host memory, which avoids oversubscription of the GPU when out-of-memory failure occurs and allows more kernels to run concurrently on the GPU. The experimental results show that compared to existing methods, our approach demonstrates a 7 times improvement in performance in terms of execution time and enables a 2.5 times increase in the concurrent execution of applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.