TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

Hai Jin,Bing Bing Zhou,Wenchao Wu,Ligang He,Xuanhua Shi

doi:10.1109/tc.2020.2990321

Abstract

Graphics Processing Units (GPUs) have evolved as powerful co-processors for the CNN training. Many new features have been introduced into GPUs such as concurrent kernel execution and hyper-Q technology. It is challenging to orchestrate concurrency for CNN (convolutional neural networks) training on GPUs since it may introduce synchronization overhead and poor resource utilization. Unlike previous research which mainly focuses on single layer or coarse-grained optimization, we introduce a critical-path based, asynchronous parallelization mechanism, and propose the optimization technique for the CNN training that takes into account global network architecture and GPU resource usage together. The proposed methods can effectively overlap the synchronization and the computation in different streams. As a result, the training process of CNN is accelerated. We have integrated our methods into Caffe. The experimental results show that the Caffe integrated with our methods can achieve 1.30X performance speedup on average compared with Caffe+cuDNN, and even higher performance speedup can be achieved for deeper, wider, and more complicated networks.

Highlights

DEEP neural networks (DNN) have been widely applied for solving problems in many practical fields such as image classification, object detection, speech recognition, and language translation
We can use the following mechanism to further improve the performance of running TurboDL in the multi-Graphics Processing Units (GPUs) setting
We mainly utilize CNN as the example to show the effectiveness of our methods, our methods have an excellent potential to be applied to other complicated multi-stage applications, such as database query processing, and other network architectures, such as RNN, tree neural network, generative adversarial network, Graph-based Convolutional Neural network (GCN)

Summary

Introduction

DEEP neural networks (DNN) have been widely applied for solving problems in many practical fields such as image classification, object detection, speech recognition, and language translation. Since training deep neural networks is a very time and resource consuming task, generalpurpose graphics processing units (GPUs) are often used to accelerate the neural network training process. It should be noted that the existing platforms are optimized for current GPUs, they may need to be revised as the GPU architectures evolve in order to make efficient use of the added features in new architectures and retain good performance. This type of re-optimization is a non-trivial task.

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computers	Publication Date: May 8, 2020
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Similar Papers

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.
Guta Tesema Tufa ... Anchit Bijalwan
Computational Intelligence and Neuroscience | VOL. 2022
Guta Tesema Tufa, et. al.Guta Tesema Tufa ... Anchit Bijalwan
17 Oct 2022
Computational Intelligence and Neuroscience | VOL. 2022

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs
Hao Fu ... Shanjiang Tang
The Journal of Supercomputing | VOL. 77
Hao Fu, et. al.Hao Fu ... Shanjiang Tang
12 Apr 2021
The Journal of Supercomputing | VOL. 77

Multilevel interference-aware scheduling on modern GPUs
Leiming Yu
-
Leiming YuLeiming Yu
10 May 2021
10 May 2021

Improved VGG-16 Neural Network for Parameter Reduction
Zheng Jiang ... Xingcan Cao
-
Zheng Jiang, et. al.Zheng Jiang ... Xingcan Cao
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Computers