Abstract

The parallel learning in neural networks can greatly shorten the training time. Its prior efforts were mostly limited to distributing inputs to multiple computing engines. It is because the gradient descent algorithm in the neural network training is inherently sequential. This paper proposes a novel CNN parallel training method for image recognition. It overcomes the sequential property of the gradient descent and enables the parallel training with the speculative backpropagation. We found that the Softmax and ReLU outcomes in the forward propagation for the same labels are likely to be very similar. This characteristic makes it possible to perform the forward and backward propagation simultaneously. We implemented the proposed parallel model with CNNs in both software and hardware, and evaluated its performance. The parallel training reduces the training time by 34% in CIFAR-100 without the loss of the prediction accuracy compared to the sequential training. In many cases, it even improves the accuracy.

Highlights

  • Artificial neural networks (ANNs) have successfully been applied in various applications such as text recognition [1], image classification [2], and speech recognition [3]

  • As a deep neural networks (DNNs) model grows in size, there are a large number of vector-matrix multiplication (VMM) operations for training

  • We propose a novel idea of breaking the sequential property of the gradient descent algorithm for convolutional neural network (CNN) parallel training

Read more

Summary

INTRODUCTION

Artificial neural networks (ANNs) have successfully been applied in various applications such as text recognition [1], image classification [2], and speech recognition [3]. As a DNN model grows in size, there are a large number of vector-matrix multiplication (VMM) operations for training. The computational complexity of larger networks increases proportionally with the number of layers and parameters. It means that DNN requires a huge amount of time for training. The forward propagation should proceed before the backpropagation It is because the gradient descent algorithm is inherently sequential. We propose a novel idea of breaking the sequential property of the gradient descent algorithm for CNN parallel training. It enables performing the forward and backward propagations in parallel. The hardware accelerator exhibits a superior performance per watt because it only requires a 1.2 % of more memory

RELATED WORK
THE NEURAL NETWORK TRAINING
SPECULATIVE BACKPROPAGATION
IMPLEMENTATION AND OPTIMIZATION OF HW PARALLEL TRAINING
EVALUATION
DISCUSSION
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call