Abstract

A binary neural network (BNN) chip explores the limits of energy efficiency and computational density for an all-digital deep neural network (DNN) inference accelerator. The chip intersperses data storage and computation using computation near memory (CNM) to reduce interconnect and data movement costs. It performs wide inner product operations to leverage parallelism inherent in DNN computations. The BNN chip leverages lightweight pipelining at a near-threshold voltage (NTV) to reduce the overhead of sequential elements. It employs optimized data access patterns to reduce memory accesses for convolutional operation with pooling layers. The combination of these techniques enables the BNN chip to achieve a peak energy efficiency of 617 TOPS/W. The digital BNN chip approaches the energy efficiency of analog in-memory techniques while also ensuring deterministic, scalable, and bit-accuracy operation. Moreover, the all-digital design leverages process scaling and does not require additional memory transistors or passive devices to attain a peak compute density of 418 TOPS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and a memory density of 414 KB/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The binary design is extended to enable bit-serial integer precision operation with a reconfigurable 1-b multiplication circuit and element-wise partial sum shift and accumulate. This technique allows for fine-grain mixed precision and retains energy efficiency by exploiting parallelism inherent in DNNs. The bit-serial binary operation allows for bit-accurate operation and high DNN accuracy that multibit analog compute-in-memory designs struggle to attain. It provides favorable energy tradeoffs compared with small-integer digital DNN accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call