Abstract

Although deep neural networks (DNNs) have become the cornerstone of Artificial Intelligence, the current training of DNNs still requires dozens of CPU hours. Prior works created various customized hardware accelerators for DNNs, however, most of these accelerators are designed to accelerate DNN inference and lack basic support for complex compute phases and sophisticated data dependency involved in DNN training. The major challenges for supporting DNN training come from various layers of the system stack: (1) the current de-facto training methods, error backpropagation (BP), requires all the weights and intermediate data to be stored in memory, and then sequentially consumed in backward paths. Therefore, weight updates are non-local and rely on upstream layers, which makes training parallelization extremely challenging and also incurs significant memory and computing overheads; (2) the power consumption of such CMOS accelerators can reach 200~250 Watt. Though emerging memory technology based designs demonstrated a great potential in low-power DNN acceleration, their power efficiency is bottlenecked by CMOS analog-to-digital converters (ADCs).In this work, we review the current advance in accelerator designs for DNNs and point out their limitations. Then we set out to address these challenges by combining innovations in training algorithm, circuits, and accelerator architecture. Our research still follows the Process-in-Memory (PIM) strategy. Specifically, we leverage the recently proposed Direct Feedback Alignment (DFA) training algorithm to overcome the limitation of long-range data dependency required by BP. We then propose to execute the DNN training in parallel in a particularly designed pipeline. We implement the proposed architecture using Ferroelectric Field-Effect Transistors (FeFET) due to their high performance and low-power operations. To further improve the power efficiency, we propose a random number generator (RNG) and an ultra-low power FeFET-based ADC. Preliminary results suggest the feasibility and promise of our approaches for low-power and highly parallel DNN training in a broad range of applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.