FeFET-based Process-in-Memory Architecture for Low-Power DNN Training

Farzaneh Zokaee,Fan Chen,Bing Li

doi:10.1109/nanoarch53687.2021.9642234

Abstract

Although deep neural networks (DNNs) have become the cornerstone of Artificial Intelligence, the current training of DNNs still requires dozens of CPU hours. Prior works created various customized hardware accelerators for DNNs, however, most of these accelerators are designed to accelerate DNN inference and lack basic support for complex compute phases and sophisticated data dependency involved in DNN training. The major challenges for supporting DNN training come from various layers of the system stack: (1) the current de-facto training methods, error backpropagation (BP), requires all the weights and intermediate data to be stored in memory, and then sequentially consumed in backward paths. Therefore, weight updates are non-local and rely on upstream layers, which makes training parallelization extremely challenging and also incurs significant memory and computing overheads; (2) the power consumption of such CMOS accelerators can reach 200~250 Watt. Though emerging memory technology based designs demonstrated a great potential in low-power DNN acceleration, their power efficiency is bottlenecked by CMOS analog-to-digital converters (ADCs).In this work, we review the current advance in accelerator designs for DNNs and point out their limitations. Then we set out to address these challenges by combining innovations in training algorithm, circuits, and accelerator architecture. Our research still follows the Process-in-Memory (PIM) strategy. Specifically, we leverage the recently proposed Direct Feedback Alignment (DFA) training algorithm to overcome the limitation of long-range data dependency required by BP. We then propose to execute the DNN training in parallel in a particularly designed pipeline. We implement the proposed architecture using Ferroelectric Field-Effect Transistors (FeFET) due to their high performance and low-power operations. To further improve the power efficiency, we propose a random number generator (RNG) and an ultra-low power FeFET-based ADC. Preliminary results suggest the feasibility and promise of our approaches for low-power and highly parallel DNN training in a broad range of applications.

Full Text