Abstract
Owing to low overhead and rapid deployment, containers are increasingly becoming an attractive system software platform for deep learning and high performance computing (HPC) applications that leverage GPUs. Unfortunately, existing container software does not concern how each container allocates GPU memory. Therefore, if a certain container consumes the majority of GPU memory, other containers may not run their workloads because of insufficient memory. This paper presents gShare, a centralized GPU memory management framework to enable GPU memory sharing for containers. As with a modern operating system, gShare allocates the entire GPU memory inside the framework and manages the memory with sophisticated memory allocators. gShare is then able to enforce the GPU memory limit of each container by mediating the memory allocation calls. To achieve its objective, gShare introduces the API remoting components, the mediator, and the three-level memory allocator, which enable lightweight and efficient GPU memory management. Our prototype implementation achieves near-native performance with secure isolation and little memory waste in popular deep learning and HPC workloads.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have