Hardware Acceleration Implementation of Sparse Coding Algorithm With Spintronic Devices

Deming Zhang,Yanchun Hou,Lang Zeng,Weisheng Zhao

doi:10.1109/tnano.2019.2916149

Abstract

Sparse coding provides a powerful means to perform feature extraction on high-dimensional data and recently has attracted broad interest for applications. However, it still suffers from a big challenge to realize real-time energy-efficient feature extraction with conventional CPUs/GPUs owing to its highly intensive computation. Benefitting from the non-volatility, low power, high speed, great scalability, and good compatibility with CMOS technology, spintronic devices, such as magnetic tunnel junction and domain wall motion (DWM) device, have been explored for applications, ranging from memory, logic, to neuromorphic computing. In this paper, we explore the possibility of hardware acceleration implementation of the sparse coding algorithm with spintronic devices by a series of design optimizations across the algorithm, architecture, circuit, and device. First, one sparse coding algorithm is selected well suited for parallelization and hardware acceleration. Then, a DWM-based compound spintronic device (CSD) is engineered and modeled, which is envisioned to achieve multiple conductance states. Sequentially, a parallel architecture is presented based on a dense cross-point array of the proposed DWM-based CSD, which can be envisioned to accelerate the selected sparse coding algorithm with a designed dedicated periphery read and write circuit owing to its massively parallel read and write operation. Experimental results show that the selected sparse coding algorithm can be accelerated by 1400× with the proposed parallel architecture in comparison with software implementation. Moreover, its energy dissipation is eight orders of magnitude smaller than that with software implementation. Additionally, an artificial neural circuit (ANC) with the proposed DWM-based CSD is also presented, which can achieve a multi-step transfer function. By using the proposed DWM-based CSD as a single synapse and the proposed ANC as one neuron, a fully forward-connected artificial neural network is constructed to classify the learned feature vectors, and the recognition error rate can reach 7.25%.

Full Text