Abstract
While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.
Highlights
As artificial intelligence (AI) applications become ubiquitous in medical care, autonomous driving, robotics, and other fields, accuracy requirements and neural network complexity increase in tandem, requiring extensive hardware support for training
Our results show that streaming batch principal component analysis and non-negative matrix factorization (NMF) decomposition algorithms can achieve near Mini-Batch Gradient Descent (MBGD) accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings
We propose an expansion of MBGD for larger batch sizes in conjunction with two gradient decomposition methods - Streaming Batch Principal Component Analysis (PCA) and non-negative matrix factorization (NMF) - and recomposition methods based on rank summation vs. rank-by-rank update applied to a network with realistic memristor hardware models
Summary
As artificial intelligence (AI) applications become ubiquitous in medical care, autonomous driving, robotics, and other fields, accuracy requirements and neural network complexity increase in tandem, requiring extensive hardware support for training. The use of such significant computing resources has major financial and environmental impacts (Nugent and Molter, 2014; Strubell et al, 2020). New neuroinspired hardware alternatives are necessary for keeping up with increasing demands on complexity and energy efficiency. Emerging non-volatile memory (NVM) technologies, such as oxygen vacancy-driven resistive switches, known as ReRAM or memristors (Chang et al, 2011; Wong et al, 2012; Chen, 2020), can combine data processing and storage.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have