MOL-Based In-Memory Computing of Binary Neural Networks

Khaled Alhaj Ali,Elsa Dupraz,Mathieu Leonardon,Mostafa Rizk,Amer Baghdadi,Jean-Philippe Diguet

doi:10.1109/tvlsi.2022.3163233

Abstract

Convolutional neural networks (CNNs) have proven very effective in a variety of practical applications involving artificial intelligence (AI). However, the layer depth of CNN deepens as user applications become more sophisticated, resulting in a huge number of operations and increased memory size. The massive amount of the produced intermediate data leads to intensive data movement between memory and computing cores causing a real bottleneck. In-memory computing (IMC) aims to address this bottleneck by directly computing inside memory, eliminating energy-intensive and time-consuming data movement. On the other hand, the emerging binary neural networks (BNNs), which is a special case of CNN, show a number of hardware-friendly properties, including memory saving. In BNN, the costly floating-point multiply-and-accumulate is replaced with lightweight bitwise XNOR and popcount operations. In this article, we propose an IMC programmable architecture targeting efficient implementation of BNN. Computational memories based on the recently introduced memristor overwrite logic (MOL) design style are employed. The architecture, which is presented in semiparallel and parallel models, efficiently executes the advanced quantization algorithm of XNOR-Net BNN. Performance evaluation based on the CIFAR-10 dataset demonstrates between <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.24\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3\times $ </tex-math></inline-formula> speedup and 49% and 99% energy saving compared to state-of-the-art implementations and up to 273-image/s/W throughput efficiency.

Full Text