Abstract

Implementing edge artificial intelligence (AI) inference and training is challenging with current memory technologies. As deep neural networks (DNNs) grow in size, this problem is only getting worse. This article presents CHIMERA, the first non-volatile DNN chip for both edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory, fabricated in 40-nm CMOS. CHIMERA&#x2019;s DNN accelerator is specifically optimized for RRAM and achieves 0.92-TOPS peak performance and 2.2-TOPS/W energy efficiency. We scale inference up to <inline-formula> <tex-math notation="LaTeX">$6\times $ </tex-math></inline-formula> larger DNNs by connecting six CHIMERAs in an illusion system with just 4&#x0025; overhead in measured execution time and 5&#x0025; in energy, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wake-up and shutdown (<inline-formula> <tex-math notation="LaTeX">$ &lt; 33 ~\mu \text{s}$ </tex-math></inline-formula>). Our incremental edge AI training algorithm, called low-rank training, overcomes RRAM write energy, speed, and endurance challenges and achieves the same accuracy as traditional algorithms with up to <inline-formula> <tex-math notation="LaTeX">$283\times $ </tex-math></inline-formula> fewer RRAM weight update steps and <inline-formula> <tex-math notation="LaTeX">$340\times $ </tex-math></inline-formula> better energy-delay product. Combined with ENDUrance REsiliency using random Remapping (ENDURER), a hardware module that provides resilience to write endurance failures, we enable ten years of 20-samples/min incremental edge AI training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call