Abstract

Due to the non-volatility nature of Resistive RAM (ReRAM), dynamic operations in the arrays contribute to a much larger portion of power in ReRAM based neural networks than static power. To reduce the dynamic power of in-situ operations with neural parameters, precisiontuning is considered a viable approach of approximate computing to trade-off excessive computation exactness for power and efficiency gains. However, the switching overhead of precision tuning in hardware severely impacts its effectiveness when the systems need to quickly react to the change of environment, user constraint or input quality. This work for the first time investigates the feasibility of agile precision tuning for neural network accelerators to benefit from approximate computing. The proposed Computing in Memory (CiM) CNN accelerators fully utilize the normally-off characteristics of memristor crossbars to achieve instant network precision tuning without worrying about the model reloading penalty. The ReRAM-based accelerator, with the proposed neural parameter mapping policy and the novel mixed-model training method, induces negligible precision-switching latency and power consumption when compared with traditional variable precision accelerators. In evaluation with state-of-the-art workloads, the proposed ReRAM DNN architecture saves 58.3%-62.47% area overhead over the baseline design. We also leverage the proposed ReRAM accelerator architecture to build a novel always-on Key-Word Spotting (KWS) system. The KWS design can switch between different precision modes to capture the relevant sound with high accuracy. The experimental results show the precisionadjustable KWS architecture saves considerable operating energy when fed with realistic test-sets of audio data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call