Abstract
We propose a novel synaptic architecture based on a NAND flash memory for highly robust and high-density quantized neural networks (QNN) with 4-bit weight and binary neuron activation, for the first time. The proposed synaptic architecture is fully compatible with the conventional NAND flash memory architecture by adopting a differential sensing scheme and a binary neuron activation of (1, 0). A binary neuron enables using a 1-bit sense amplifier, which significantly reduces the burden of peripheral circuits and power consumption and enables bitwise communication between the layers of neural networks. Operating NAND cells in the saturation region eliminates the effect of metal wire resistance and serial resistance of the NAND cells. With a read-verify-write (RVW) scheme, low-variance conductance distribution is demonstrated for 8 levels. Vector-matrix multiplication (VMM) of a 4-bit weight and binary activation can be accomplished by only one input pulse, eliminating the need of a multiplier and an additional logic operation. In addition, quantization training can minimize the degradation of the inference accuracy compared to post-training quantization. Finally, the low-variance conductance distribution of the NAND cells achieves a higher inference accuracy compared to that of resistive random access memory (RRAM) devices by 2~7 % and 0.04~0.23 % for CIFAR 10 and MNIST datasets, respectively.
Highlights
Deep neural networks (DNNs) have achieved remarkable fulfillment for various intelligent tasks, such as speech recognition, computer vision, and natural language processing [1]–[3]
A novel synaptic string architecture based on a NAND flash memory for highly robust and high-density quantized neural networks (QNN) with binary neuron activation was proposed, for the first time
The differential sensing scheme and neuron activation of (1, 0) instead of (1, −1) are appropriately compatible with the conventional NAND flash memory architecture consisting of cell strings
Summary
Deep neural networks (DNNs) have achieved remarkable fulfillment for various intelligent tasks, such as speech recognition, computer vision, and natural language processing [1]–[3]. Recent state-of-the-art DNNs demand a large neural network size and a huge volume of parameters, which need very fast graphic processing units (GPUs), enormous memory-storage and large computational power [4], [5]. The von Neumann bottleneck results in enormous energy and time consumption when performing VMM operations due to the large amount of moving. Neuromorphic systems have been actively investigated as a solution to the von Neumann bottleneck utilizing in-memory computing with a synaptic array architecture. A synaptic device array can perform VMM in a single time step, which is orders of magnitude more efficient than the conventional von Neumann architecture [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.