Reducing the Memory Cost of Training Convolutional Neural Networks by CPU Offloading

Tristan Hascoet,Tetsuya Takiguchi,Yasuo Ariki,Quenti Febvre,Weihao Zhuang

doi:10.4236/jsea.2019.128019

Tristan Hascoet, Tetsuya Takiguchi + Show 3 more

Open Access

https://doi.org/10.4236/jsea.2019.128019

Copy DOI

Abstract

In recent years, Convolutional Neural Networks (CNNs) have enabled unprecedented progress on a wide range of computer vision tasks. However, training large CNNs is a resource-intensive task that requires specialized Graphical Processing Units (GPU) and highly optimized implementations to get optimal performance from the hardware. GPU memory is a major bottleneck of the CNN training procedure, limiting the size of both inputs and model architectures. In this paper, we propose to alleviate this memory bottleneck by leveraging an under-utilized resource of modern systems: the device to host bandwidth. Our method, termed CPU offloading, works by transferring hidden activations to the CPU upon computation, in order to free GPU memory for upstream layer computations during the forward pass. These activations are then transferred back to the GPU as needed by the gradient computations of the backward pass. The key challenge to our method is to efficiently overlap data transfers and computations in order to minimize wall time overheads induced by the additional data transfers. On a typical work station with a Nvidia Titan X GPU, we show that our method compares favorably to gradient checkpointing as we are able to reduce the memory consumption of training a VGG19 model by 35% with a minimal additional wall time overhead of 21%. Further experiments detail the impact of the different optimization tricks we propose. Our method is orthogonal to other techniques for memory reduction such as quantization and sparsification so that they can easily be combined for further optimizations.

Highlights

Over the last few years, Convolutional Neural Networks (CNNs) [1] [2] have enabled unprecedented progress on a wide array of computer vision tasks
On a typical work station with a Nvidia Titan X Graphical Processing Units (GPU), we show that our method compares favorably to gradient checkpointing as we are able to reduce the memory consumption of training a VGG19 model by 35% with a minimal additional wall time overhead of 21%
The key challenge in our implementation consists in synchronizing the data transfers with the computations so that only the minimal amount of activation values is loaded in GPU memory at any given time, while the least amount of time is spent waiting for the data transfer

Summary

Introduction

Over the last few years, Convolutional Neural Networks (CNNs) [1] [2] have enabled unprecedented progress on a wide array of computer vision tasks. One disadvantage of these approaches is their resource consumption: Training deep models within a reasonable amount of time requires special Graphical Processing Units (GPU) with numerous cores and large memory capacity. Typical desktop GPUs memory capacity is too small for training large CNNs. As a result, getting into deep learning research comes with the barrier cost of either buying specialized hardware or renting live instances from cloud service providers, while standard laptop GPUs remain idle untapped resources. Reducing the memory cost of deep model training allows training deep nets on standard graphic cards without the need for specialized hardware, effectively removing this barrier cost

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Software Engineering and Applications	Publication Date: Jan 1, 2019
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reducing the Memory Cost of Training Convolutional Neural Networks by CPU Offloading

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications

Lead the way for us

Similar Papers

Ballooning Graphics Memory Space in Full GPU Virtualization Environments
Younghun Park ... Minwoo Gu
Scientific Programming | VOL. 2019
Younghun Park, et. al.Younghun Park ... Minwoo Gu
23 Apr 2019
Scientific Programming | VOL. 2019

Acceleration of Large Deep Learning Training with Hybrid GPU Memory Management of Swapping and Re-computing
Haruki Imai ... Kiyokuni Kawachiya
-
Haruki Imai, et. al.Haruki Imai ... Kiyokuni Kawachiya
10 Dec 2020
10 Dec 2020

A framework for efficient and scalable execution of domain-specific templates on GPUs
Narayanan Sundaram ... Srimat T Chakradhar
-
Narayanan Sundaram, et. al.Narayanan Sundaram ... Srimat T Chakradhar
01 May 2009
01 May 2009

RSVM: a region-based software virtual memory for GPU
...
-
, et. al. ...
07 Oct 2013
07 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reducing the Memory Cost of Training Convolutional Neural Networks by CPU Offloading

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications