Exploiting potential of deep neural networks by layer-wise fine-grained parallelism

Wenbin Jiang,Yangsong Zhang,Pai Liu,Jing Peng,Laurence T Yang,Geyan Ye,Hai Jin

doi:10.1016/j.future.2019.07.054

Abstract

Deepneuralnetworks (DNNs) have become more and more important for big data analysis. They usually use data parallelism or model parallelism for extreme scale computing. However, the two approaches realize the performance improvement mainly by using coarse-grained parallelization schemes. Neither can fully exploit the potentials of the parallelism of many-core systems (such as GPUs) for neural network models. Here, a new fine−grainedparallelismstrategy (named FiLayer) is presented based on layer-wise parallelization. It has two components: inter-layer parallelism and intra-layer parallelism. The inter-layer parallelism makes several neighboring layers be processed by using a pipeline manner in a network model. For intra-layer parallelism, the operations in one layer are separated into several parts and processed concurrently. To implement above fine-grained parallelism methods, CUDA streams are used. A mathematical analysis is presented for the influence of fragment number on performance of the inter-layer parallelism, and also an analysis for the influence of CUDA stream number on the performance of the intra-layer parallelism is given. The proposed approach is realized based on Caffe. Some representative datasets including CIFAR100 and ImageNet, are applied for experiments. The evaluation results show that it can help Caffe realize remarkable speedups, which makes much sense to big data analysis.

Full Text