An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory

T Patrick Xiao,Sapan Agarwal,Ben Feinberg,Ramesh Chettuvetty,Prashant Saxena,Christopher H Bennett,Vineet Agrawal,Matthew J Marinella,Venkatraman Prabhakar,Vijay Raghavan,Krishnaswamy Ramkumar,Harsha Medu

doi:10.1109/tcsi.2021.3134313

Abstract

We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-to-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$> 10\times $ </tex-math></inline-formula> gain in energy efficiency over state-of-the-art digital and analog inference accelerators.

Full Text