Abstract

A pipelined wired-logic deep neural network (DNN) processor implemented in a 16-nm field-programmable gate array (FPGA) is presented. The latency and power required for memory access are minimized by utilizing the wired-logic architecture, thus enabling low power and high throughput operation. One technical issue with the wired-logic architecture is that it requires a lot of hardware resources. To reduce them, two core technologies are developed: (1) a convolutional non-linear neural network (CNNN) and (2) a pipeline-type neuron cell. The CNNN optimizes both the network structure and the non-linear activation function of each neuron by using a newly developed back-propagation-based training method. While conventional reinforcement learning can train only a small size network thus limiting its application to handwritten number recognition, the proposed CNNN enables a larger network size making it applicable to object recognition. The pipeline-type neuron cell has a small look-up table (LUT) to process non-linear functions using only a small amount of hardware resources. These two technologies enable the implementation of the entire network on a single FPGA chip with the wired-logic architecture. Three types of CNNN trained on the CIFAR-10 dataset are implemented in 16-nm FPGAs. An energy efficiency of 0.09, 0.12, and <inline-formula> <tex-math notation="LaTeX">$0.61~\mu \text{J}$ </tex-math></inline-formula>/frame is achieved with 70&#x0025;, 75&#x0025;, and 82&#x0025; accuracy, respectively. Compared with a state-of-the-art accelerator using a binary neural network (BNN), the energy efficiency is improved by more than two orders of magnitude.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call