Abstract
In a previous work we have detailed the requirements for obtaining maximal deep learning performance benefit by implementing fully connected deep neural networks (DNN) in the form of arrays of resistive devices. Here we extend the concept of Resistive Processing Unit (RPU) devices to convolutional neural networks (CNNs). We show how to map the convolutional layers to fully connected RPU arrays such that the parallelism of the hardware can be fully utilized in all three cycles of the backpropagation algorithm. We find that the noise and bound limitations imposed by the analog nature of the computations performed on the arrays significantly affect the training accuracy of the CNNs. Noise and bound management techniques are presented that mitigate these problems without introducing any additional complexity in the analog circuits and that can be addressed by the digital circuits. In addition, we discuss digitally programmable update management and device variability reduction techniques that can be used selectively for some of the layers in a CNN. We show that a combination of all those techniques enables a successful application of the RPU concept for training CNNs. The techniques discussed here are more general and can be applied beyond CNN architectures and therefore enables applicability of the RPU approach to a large class of neural network architectures.
Highlights
Deep neural network (DNN) (LeCun et al, 2015) based models have demonstrated unprecedented accuracy, in cases exceeding human level performance, in cognitive tasks such as object recognition (Krizhevsky et al, 2012; He et al, 2015; Simonyan and Zisserman, 2015; Szegedy et al, 2015), speech recognition (Hinton et al, 2012), and natural language processing (Collobert et al, 2012)
Our analysis shows that the larger test error is mainly due to contributions of analog noise introduced during the backward cycle, and signal bounds introduced in the forward cycle on the final Resistive Processing Unit (RPU) array, W4
The combination of all of the management techniques with the 13-device mapping on the second convolutional layer (K2) brings the model’s test error to 0.8%. The performance of this final RPU model is almost indistinguishable from the floating point (FP)-baseline model and shows the successful application of RPU approach for training convolutional neural networks (CNNs)
Summary
Deep neural network (DNN) (LeCun et al, 2015) based models have demonstrated unprecedented accuracy, in cases exceeding human level performance, in cognitive tasks such as object recognition (Krizhevsky et al, 2012; He et al, 2015; Simonyan and Zisserman, 2015; Szegedy et al, 2015), speech recognition (Hinton et al, 2012), and natural language processing (Collobert et al, 2012) These accomplishments are made possible thanks to the advances in computing architectures and the availability of large amounts of labeled training data. We show that the RPU concept is applicable for CNNs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.