RAPA-ConvNets: Modified Convolutional Networks for Accelerated Training on Architectures With Analog Arrays.

Malte J Rasch,Wilfried Haensch,Tayfun Gokmen,Mattia Rigotti

doi:10.3389/fnins.2019.00753

Abstract

Analog arrays are a promising emerging hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they employ analog circuitry to compute matrix-vector products in constant time, irrespective of the size of the matrix. However, ConvNets map very unfavorably onto analog arrays when done in a straight-forward manner, because kernel matrices are typically small and the constant time operation needs to be sequentially iterated a large number of times. Here, we propose to parallelize the training by replicating the kernel matrix of a convolution layer on distinct analog arrays, and randomly divide parts of the compute among them. With this modification, analog arrays execute ConvNets with a large acceleration factor that is proportional to the number of kernel matrices used per layer (here tested 16-1024). Despite having more free parameters, we show analytically and in numerical experiments that this new convolution architecture is self-regularizing and implicitly learns similar filters across arrays. We also report superior performance on a number of datasets and increased robustness to adversarial attacks. Our investigation suggests to revise the notion that emerging hardware architectures that feature analog arrays for fast matrix-vector multiplication are not suitable for ConvNets.

Highlights

Training deep networks is notoriously computationally intensive
We tested the performance of this setup on the three datasets with and without tiling, and compared different tiling schemes using floating point (FP) precision
The main results from these experiments are: (1) Random tiling achieves the best performance among all tiling schemes; (2) Across datasets, random tiling comes close or beats the regular ConvNet with no tiling; (3) subsampling the input images is not sufficient to explain the high performance of random tiling, since the perforated scheme generally performed poorly

Summary

Introduction

Training deep networks is notoriously computationally intensive. The popularity of ConvNets is largely due to the reduced computational burden they allow thanks to their parsimonious number of free parameters (as compared to fully connected networks), and their favorable mapping on existing graphic processing units (GPUs; Chetlur et al, 2014).Recently, speedup strategies of the matrix multiply-and-accumulate (MAC) operation (the computational workhorse of deep learning) based on mixed analog-digital approaches has been gaining increasing attention. The popularity of ConvNets is largely due to the reduced computational burden they allow thanks to their parsimonious number of free parameters (as compared to fully connected networks), and their favorable mapping on existing graphic processing units (GPUs; Chetlur et al, 2014). The forward, backward and update steps of back-propagation algorithms can be performed with significantly reduced data movement. These analog arrays rely on the idea of implementing matrix-vector multiplications on an array of analog devices by exploiting their Ohmic properties, resulting in a one-step constant time operation, i.e., with execution time independent of the matrix size (up to size limitations due to the device technology; Gokmen and Vlasov, 2016)

Objectives

Methods

Results

Discussion

Conclusion