Abstract

Precision-tuning is a popular approach of approximate computing to trade-off excessive computation exactness for power and efficiency gains. Particularly, it has been proved useful to reduce the computation and memory overhead for the deep neural networks on embedded and IoT usage. However, the switching overhead of precision tuning in hardware severely impacts its applicability and effectiveness to save more energy by quickly reacting to the change of environment, user constraint or input quality. This work for the first time investigates the feasibility of agile and cost-free precision tuning for neural network accelerators to benefit from approximate computing. The proposed Processing in Memory (PIM) CNN accelerators fully utilize the normally-off characteristics of memristor crossbars to achieve instant network precision tuning without worrying about the model reloading penalty. The ReRAM-based accelerator, with the proposed neural parameter mapping policy and the novel mixed-model training method, involves negligible precision-switching latency and power consumption compared with traditional variable precision accelerators. The proposed mixed-model training perfectly unifies the neural models of different precision into a single ReRAM array without compromising the accuracy, and the ReRAM accelerator could save 58.3%-62.47% area overhead compared with conventional designs that have to program multiple independent models into ReRAM arrays for precision tuning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call