Abstract

The increasing incorporation of Graphics Processing Units (GPUs) as accelerators has been one of the forefront High Performance Computing (HPC) trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD) programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to it. However, since CPUs generally outnumber GPUs, the asymmetric resource distribution gives rise to overall computing resource underutilization. In this paper, we propose to efficiently share the GPU under SPMD and formally define a series of GPU sharing scenarios. We provide performance-modeling analysis for each sharing scenario with accurate experimentation validation. With the modeling basis, we further conduct experimental studies to explore potential GPU sharing efficiency improvements from multiple perspectives. Both further theoretical and experimental GPU sharing performance analysis and results are presented. Our results not only demonstrate the significant performance gain for SPMD programs with the proposed efficient GPU sharing, but also the further improved sharing efficiency with the optimization techniques based on our accurate modeling.

Highlights

  • Recent years have seen the proliferation of Graphics Processing Units (GPUs) as application accelerators in High Performance Computing (HPC) Systems, due to the rapid advancements in graphic processing technology over the past few years and the introduction of programmable processors in GPUs, which is known as GPGPU or General-Purpose Computation on Graphic Processing Units [1]

  • A series of GPU sharing execution models have been introduced for each of the sharing scenarios, and we provide a theoretical prediction of the attainable performance gain over the non-sharing scenario

  • Initial performance benchmarking was conducted to validate the accuracy of the proposed sharing scenario modeling, followed by the detailed performance analysis for each of the sharing scenarios using varied benchmark profiles

Read more

Summary

Introduction

Recent years have seen the proliferation of Graphics Processing Units (GPUs) as application accelerators in High Performance Computing (HPC) Systems, due to the rapid advancements in graphic processing technology over the past few years and the introduction of programmable processors in GPUs, which is known as GPGPU or General-Purpose Computation on Graphic Processing Units [1]. A wide range of HPC systems have incorporated GPUs to accelerate applications by utilizing the unprecedented floating point performance and massively parallel processor architectures of modern. From an overall perspective, to achieve GPU sharing, our proposed sharing approach is to launch multiple GPU kernels from multi-processes/threads using CUDA streaming execution within a single GPU context, while the single context requirement is met by launching kernels from a single process, such as our virtualization implementation. Multiple perspectives of optimization are being considered for different sharing scenarios, ranging from the problem/kernel size and parallelisms of the SPMD program to optimizable sharing scenarios Based on these factors, we provide experimental optimization analysis and achieve an optimized I/O concurrency for kernels under Time Sharing, a better Streaming Multiprocessor (SM).

Related Work
Background of GPU Computing
Programming Models
An Architectural Model
GPU Device-Level Execution Flow
GPU Sharing Approach with Streams for SPMD Programs
GPU Sharing Scenarios
Exclusive Space Sharing
Non-Exclusive Space Sharing
Time Sharing
GPU Sharing and Execution Model
Execution Model for Compute-Intensive Applications
Theoretical Performance Gains
Experimental Analysis and Performance Results
Experimental Validation of the Sharing Model
Performance Prediction from the Model
Sharing Efficiency Exploration and Improvement Potential Analysis
Sharing Scenario Casting
Performance Gains with GPU Sharing for SPMD Programs
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.