Abstract
Due to the excessive training parameters and computation of Deep Neural Network (DNN) models, we have witnessed the training time increases with the continued increase of the scale of DNN models. Convolution computation is the key step of feature extraction in DNN models and occupies about 90% of the computation operations in DNN models. It is therefore of great necessity to accelerate the speed of convolution calculation in order to improve the training efficiency of system. Currently, the traditional method is to transform convolution computation into matrix multiplication and execute on many-core architecture such as Graphic Processing Unit (GPU). However, due to matrix conversion and high computational complexity of matrix multiplication, these methods based on matrix multiplication such as Caffe, consume a lot of time in accessing memory and need to copy redundant data in process of matrix conversion. The low capacity of GPU memory further leads to the low training efficiency of DNN models. Therefore, we orchestrate a dynamic look-up table method instead of matrix multiplication to realize convolution calculation in order to optimize the training of DNN on many-core architecture. We further improve the parallelism of the look-up table method in a more fine-grained parallelism by paralleling the building convolution table and look-up table operation based on GPU. In our experiment, we tested and trained MINIST, CIFAR-10, CIFAR-100 and ImageNet data sets respectively. Experiments show that, compared to the original Caffe, the proposed dynamic look-up table method referred as LTP-Caffe can achieve up to 31% speed-up ratio in the training process of DNN models. Experiments further show that LTP-Caffe and Caffe are comparable in accuracy, but the iteration speed of LTP-Caffe is more quickly than the Caffe.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have