Abstract

As AI applications become pervasive on edge device, incrementally learning new tasks is demanded for deep neural network (DNN) models. In this article, we proposed AILC, a compute-in-memory (CIM)-based accelerator for on-chip incremental learning using STT-MRAM technology. On the software side, a network-expansion-based low-precision training algorithm is proposed for incremental learning, where the loss function is modified to handle the unbalanced training dataset. On the hardware side, the detailed CIM accelerator design for incremental learning is illustrated. A workload-aware hardware resources assignment protocol is proposed to improve the throughput when the workload of weight gradient calculation is low. The software simulation results on CIFAR-100 dataset show that the proposed algorithm can effectively support incremental learning despite the device conductance variation exists. System-level benchmark shows that AILC could achieve 147×, 3.7×~28.7×, 2.05×~2.9× higher energy efficiency than Nvidia Titan-V GPU, RRAM-based CIM accelerators and edge TPU/GPU, respectively. Compared to the baselines, the throughput of AILC is improved by 2.0×~2.2× on average with the hardware resources assignment protocol, which results in 4.1×~21.4× higher throughput than edge TPU/GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call