Abstract

This work presents a dynamically reconfigurable architecture for Neural Network (NN) accelerators implemented in Field-Programmable Gate Array (FPGA) that can be applied in a variety of application scenarios. Although the concept of Dynamic Partial Reconfiguration (DPR) is increasingly used in NN accelerators, the throughput is usually lower than pure static designs. This work presents a dynamically reconfigurable energy-efficient accelerator architecture that does not sacrifice throughput performance. The proposed accelerator comprises reconfigurable processing engines and dynamically utilizes the device resources according to model parameters. Using the proposed architecture with DPR, different NN types and architectures can be realized on the same FPGA. Moreover, the proposed architecture maximizes throughput performance with design optimizations while considering the available resources on the hardware platform. We evaluate our design with different NN architectures for two different tasks. The first task is the image classification of two distinct datasets, and this requires switching between Convolutional Neural Network (CNN) architectures having different layer structures. The second task requires switching between NN architectures, namely a CNN architecture with high accuracy and throughput and a hybrid architecture that combines convolutional layers and an optimized Spiking Neural Network (SNN) architecture. We demonstrate throughput results from quickly reprogramming only a tiny part of the FPGA hardware using DPR. Experimental results show that the implemented designs achieve a 7× faster frame rate than current FPGA accelerators while being extremely flexible and using comparable resources.

Highlights

  • The application of artificial intelligence models at the edge requires novel software and hardware architectures capable of executing many tasks in an energy-efficient manner

  • In the first use case, three different Convolutional Neural Network (CNN) classifiers trained by three different datasets are designed and switched using Dynamic Partial Reconfiguration (DPR) for two different scenarios

  • The weights and biases of the structurally identical layers are updated from the memory-mapped interface of the processing elements (PE) and the non-identical layers are dynamically partially reconfigured with the corresponding PE

Read more

Summary

Introduction

The application of artificial intelligence models at the edge requires novel software and hardware architectures capable of executing many tasks in an energy-efficient manner. It is mandatory to execute many applications on the same hardware platform in the most efficient way [1,2]. State-of-the-art Central Processing Unit (CPU) performs 10–100 FLOP per second with typical power efficiency in the order of 1 GOP/J [3]. We only focus on accelerated and efficient inference, which means using a pre-trained and optimized model to perform prediction, regression, or classification tasks. In standard CNNs models, the parameters of each layer are the connection weights, the neuron biases, and the input and output of each layer are activations. CNN’s neurons do not store any state as their activations are computed at each input cycle. In SNNs models, the parameters of each layer are connection weights

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call