Abstract
The increasing incorporation of Graphics Processing Units (GPUs) as accelerators has been one of the forefront High Performance Computing (HPC) trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD) programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to it. However, since CPUs generally outnumber GPUs, the asymmetric resource distribution gives rise to overall computing resource underutilization. In this paper, we propose to efficiently share the GPU under SPMD and formally define a series of GPU sharing scenarios. We provide performance-modeling analysis for each sharing scenario with accurate experimentation validation. With the modeling basis, we further conduct experimental studies to explore potential GPU sharing efficiency improvements from multiple perspectives. Both further theoretical and experimental GPU sharing performance analysis and results are presented. Our results not only demonstrate the significant performance gain for SPMD programs with the proposed efficient GPU sharing, but also the further improved sharing efficiency with the optimization techniques based on our accurate modeling.
Highlights
Recent years have seen the proliferation of Graphics Processing Units (GPUs) as application accelerators in High Performance Computing (HPC) Systems, due to the rapid advancements in graphic processing technology over the past few years and the introduction of programmable processors in GPUs, which is known as GPGPU or General-Purpose Computation on Graphic Processing Units [1]
A series of GPU sharing execution models have been introduced for each of the sharing scenarios, and we provide a theoretical prediction of the attainable performance gain over the non-sharing scenario
Initial performance benchmarking was conducted to validate the accuracy of the proposed sharing scenario modeling, followed by the detailed performance analysis for each of the sharing scenarios using varied benchmark profiles
Summary
Recent years have seen the proliferation of Graphics Processing Units (GPUs) as application accelerators in High Performance Computing (HPC) Systems, due to the rapid advancements in graphic processing technology over the past few years and the introduction of programmable processors in GPUs, which is known as GPGPU or General-Purpose Computation on Graphic Processing Units [1]. A wide range of HPC systems have incorporated GPUs to accelerate applications by utilizing the unprecedented floating point performance and massively parallel processor architectures of modern. From an overall perspective, to achieve GPU sharing, our proposed sharing approach is to launch multiple GPU kernels from multi-processes/threads using CUDA streaming execution within a single GPU context, while the single context requirement is met by launching kernels from a single process, such as our virtualization implementation. Multiple perspectives of optimization are being considered for different sharing scenarios, ranging from the problem/kernel size and parallelisms of the SPMD program to optimizable sharing scenarios Based on these factors, we provide experimental optimization analysis and achieve an optimized I/O concurrency for kernels under Time Sharing, a better Streaming Multiprocessor (SM).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.