Abstract

Many emerging systems concurrently execute multiple applications that use deep neural network (DNN) as a key portion of the computation. To speedup the execution of such DNNs, various hardware accelerators have been proposed in recent works. Deep learning processor unit (DPU) from Xilinx is one such accelerator targeted for field programmable gate array (FPGA)-based systems. We study the runtime and energy consumption for different DNNs on a range of DPU configurations and derive useful insights. Using these insights, we formulate a design space exploration (DSE) strategy to explore tradeoffs in accuracy, runtime, cost, and energy consumption arising due to flexibility in choosing DNN topology, DPU configuration, and FPGA model. The proposed strategy provides a reduction of 28× in the number of design points to be simulated and 23× in the pruning time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call