Work-in-Progress: NVIDIA GPU Scheduling Details in Virtualized Environments

Nicola Capodieci,Marko Bertogna,Roberto Cavicchioli

doi:10.1109/emsoft.2018.8537220

Nicola Capodieci, Marko Bertogna + Show 1 more

Open Access

https://doi.org/10.1109/emsoft.2018.8537220

Copy DOI

Publication Date: Sep 1, 2018
Citations: 6	License type: other-oa

Affiliation: University of Modena and Reggio Emilia

Abstract

Modern automotive grade embedded platforms feature high performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications. Hence, a GPU scheduling approach with strong Real-Time guarantees is needed. While previous research efforts focused on reverse engineering the GPU ecosystem in order to understand and control GPU scheduling on NVIDIA platforms, we provide an in depth explanation of the NVIDIA standard approach to GPU application scheduling on a Drive PX platform. Then, we discuss how a privileged scheduling server can be used to enforce arbitrary scheduling policies in a virtualized environment.

Highlights

Advanced Driver-Assistance Systems (ADAS) often feature an integrated Graphics Processing Units (GPUs) as a massively parallel programmable processor that has to be shared across a potentially large variety of applications, each having di erent timing requirements
The NVIDIA GPU scheduler features a hardware controller embedded in the GPU within a component called “Host""
In the NVIDIA runlist approach, the Host scheduler allows only one application to be resident within the GPU engines at a given time, and preemption is only initiated by a timeslice expiration event

Summary

INTRODUCTION

Advanced Driver-Assistance Systems (ADAS) often feature an integrated GPU as a massively parallel programmable processor that has to be shared across a potentially large variety of applications, each having di erent timing requirements. We disclose and discuss the current NVIDIA approach to GPU scheduling for both graphic and compute applications on the Drive PX-2 “AutoCruise’ platform. The board features a single Tegra Parker SoC, which is composed by an exa-core CPU complex (a four-core ARM Cortex A-57 cluster, and a dual-core ARM-v8 compatible NVIDIA Denver cluster) and an integrated GPU. We did an extensive e ort to provide information on previously undisclosed technical details, whereas previous research contributions mostly involved reverse engineering the architecture [3], due to the closed-source nature of the NVIDIA software ecosystem [6]. We describe how to enforce arbitrary scheduling policies at hypervisor level, as NVIDIA Pascal architecture allows for graphic shader/compute kernel preemption at pixel/thread granularity

GPU SCHEDULING

FUTURE WORK ON VIRTUALIZATION