TMA: Tera‐MACs/W neural hardware inference accelerator with a multiplier‐less massive parallel processor

Hyunbin Park,Shiho Kim,Dohyun Kim

doi:10.1002/cta.2917

Abstract

SummaryComputationally intensive inference tasks of deep neural networks have brought about a revolution in accelerator architecture, aiming to reduce power consumption as well as latency. The key figure‐of‐merit in hardware inference accelerators is the number of multiply‐and‐accumulation operations per watt (MACs/W); the state‐of‐ the‐art MACs/W, so far, has been several hundreds Giga‐MACs/W. We propose a Tera‐ MACS/W neural hardware inference accelerator (TMA) with 8‐bit activations and scalable integer weights less than 1‐byte. The architecture's main feature is a configurable neural processing element for matrix‐vector operations. The proposed neural processing element uses a multiplier‐less massive parallel processor that works without multipliers, which makes it attractive for energy efficient high‐performance neural network applications. We benchmark our system's latency, power, and performance using Alexnet trained on ImageNet. Finally, we compared our accelerator's throughput and power consumption to that of the prior works. The proposed accelerator outperforms the state‐of‐the‐art counterparts, in terms of the energy and area efficiency, achieving 2.3 TMACs/W@1.0 V on a 28‐nm Virtex‐7 FPGA chip.

Full Text