Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Tobias Alonso,Yaman Umuroglu,Jakoba Petri-Koenig,Lucian Petrica,Michaela Blott,Elias Koromilas,Ioannis Stamelos,Mario Ruiz,Kees Vissers

doi:10.1145/3470567

Abstract

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Reconfigurable Technology and Systems	Publication Date: Dec 6, 2021
Citations: 13	License type: other-oa

R Discovery Prime

R Discovery Prime

Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems

Lead the way for us

Similar Papers

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism
Jing Li ... Song Guo
IEEE Transactions on Mobile Computing | VOL. 22
Jing Li, et. al.Jing Li ... Song Guo
01 May 2023
IEEE Transactions on Mobile Computing | VOL. 22

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism
Jing Li ... Weifa Liang
-
Jing Li, et. al.Jing Li ... Weifa Liang
04 Oct 2021
04 Oct 2021

A Strategy to Accelerate the Inference of a Complex Deep Neural Network
P Haseena Rahmath ... Kuldeep Chaurasia
-
P Haseena Rahmath, et. al.P Haseena Rahmath ... Kuldeep Chaurasia
01 Jan 2023
01 Jan 2023

IGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
Fei Xu ... Zhi Zhou
IEEE Transactions on Parallel and Distributed Systems | VOL. 34
Fei Xu, et. al.Fei Xu ... Zhi Zhou
01 Mar 2023
IEEE Transactions on Parallel and Distributed Systems | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems