Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

Baozi Chen,Lei Wang,Yusong Tan,Qingbo Wu,Peng Zou

doi:10.1109/les.2017.2776949

Abstract

Deep learning system composed with multiple levels of layers is increasingly presented in diverse areas nowadays. To achieve good performance, multicore CPUs and accelerators are widely used in real system. Previous study shows that GPU can significantly speed up computation in deep neural networks, while the performance does not scale very well on multicore CPUs. In this letter, we run Caffe on various hardware platforms using different computation setups to train LeNet-5 on MNIST dataset and measure individual time durations of forward and backward passes for each layer. We find that the speedups perform diversely and the scalability of multicore CPU varies when processing different stages of the network. Based on the observation, we show it is worth applying different policies for each layer separately to achieve the overall optimized performance. In addition, our benchmarking results can be used for references to develop dedicated acceleration methods for individual layer of the network.

Full Text