COMET

Sian Jin,Dingwen Tao,Xintong Jiang,Hui Guan,Yunhe Feng,Shuaiwen Leon Song,Chengming Zhang,Guanpeng Li

doi:10.14778/3503585.3503597

Abstract

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

COMET

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Dec 1, 2021
Citations: 13

Similar Papers

A novel memory-efficient deep learning training framework via error-bounded lossy compression
Sian Jin ... Dingwen Tao
-
Sian Jin, et. al.Sian Jin ... Dingwen Tao
17 Feb 2021
17 Feb 2021

Profiling DNN Workloads on a Volta-based DGX-1 System
Saiful A Mojumder ... David Kaeli
-
Saiful A Mojumder, et. al.Saiful A Mojumder ... David Kaeli
01 Sep 2018
01 Sep 2018

Error Resilient Machine Learning for Safety-Critical Systems: Position Paper
Karthik Pattabiraman ... Zitao Chen
-
Karthik Pattabiraman, et. al.Karthik Pattabiraman ... Zitao Chen
01 Jul 2020
01 Jul 2020

QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks
Shuyu Zhang ... Xiangyu Zou
-
Shuyu Zhang, et. al.Shuyu Zhang ... Xiangyu Zou
01 Oct 2021
01 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

COMET

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment