Ballooning Graphics Memory Space in Full GPU Virtualization Environments

Younghun Park,Sungyong Park,Minwoo Gu

doi:10.1155/2019/5240956

Abstract

Advances in virtualization technology have enabled multiple virtual machines (VMs) to share resources in a physical machine (PM). With the widespread use of graphics-intensive applications, such as two-dimensional (2D) or 3D rendering, many graphics processing unit (GPU) virtualization solutions have been proposed to provide high-performance GPU services in a virtualized environment. Although elasticity is one of the major benefits in this environment, the allocation of GPU memory is still static in the sense that after the GPU memory is allocated to a VM, it is not possible to change the memory size at runtime. This causes underutilization of GPU memory or performance degradation of a GPU application due to the lack of GPU memory when an application requires a large amount of GPU memory. In this paper, we propose a GPU memory ballooning solution calledgBalloonthat dynamically adjusts the GPU memory size at runtime according to the GPU memory requirement of each VM and the GPU memory sharing overhead. ThegBalloonextends the GPU memory size of a VM by detecting performance degradation due to the lack of GPU memory. ThegBalloonalso reduces the GPU memory size when the overcommitted or underutilized GPU memory of a VM creates additional overhead for the GPU context switch or the CPU load due to GPU memory sharing among the VMs. We implemented thegBalloonby modifying thegVirt, a full GPU virtualization solution for Intel’s integrated GPUs. Benchmarking results show that thegBalloondynamically adjusts the GPU memory size at runtime, which improves the performance by up to 8% against thegVirtwith 384 MB of high global graphics memory and 32% against thegVirtwith 1024 MB of high global graphics memory.

Highlights

Running graphics-intensive applications that include threedimensional (3D) visualization and rendering in a virtualized environment creates a new challenge for highperformance graphics processing unit (GPU) virtualization solutions
The GPU context switch time increases as the GPU memory size of each virtual machines (VMs) gets larger [22]. ird, as we reported in a previous study [22], small GPU memory size affects the performance of GPU workload, especially when VMs run with graphics operations for rendering or high-resolution display environments
Using various CPU and GPU benchmarks, we show that the gBalloon dynamically adjusts the GPU memory size at runtime and outperforms the gVirt by up to 8% against the gVirt with 384 MB of high global graphics memory and 32% against the gVirt with 1024 MB of high global graphics memory

Summary

Introduction

Running graphics-intensive applications that include threedimensional (3D) visualization and rendering in a virtualized environment creates a new challenge for highperformance graphics processing unit (GPU) virtualization solutions. Several studies [23,24,25,26,27,28,29] dynamically adjusted the memory allocation of existing VMs, in these studies, memory was taken from a specific VM and allocated to another VM when the physical memory was insufficient As these approaches assume an environment in which each VM has an independent virtual address space, it is difficult to apply those techniques directly to the full GPU virtualization environment in which the same virtual GPU memory space is shared. Special care has to be taken to reduce the memory copy overhead across the system bus if the gBalloon is implemented over discrete GPUs. As the gVirt is open source and the access to the source codes for the NVIDIA driver and runtime is limited, we decided to use the gVirt as a software platform to verify our proposed idea.

Background and Motivation

Design of gBalloon

Performance Evaluation

Related Works

Findings

Conclusion and Future Works