A high-performance dataflow-centric optimization framework for deep learning inference on the edge

Runhua Zhang,Hongxu Jiang,Jinkun Geng,Fangzheng Tian,Yuhang Ma,Haojie Wang

doi:10.1016/j.sysarc.2024.103180

Abstract

Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.Targeting the existing drawbacks of operator-centric frameworks, we design Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9% , respectively. Besides, Xenos also outperforms the widely-used TVM by 1.1×–1.9×. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68×–3.78× compared with the single device.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A high-performance dataflow-centric optimization framework for deep learning inference on the edge

Abstract

Talk to us

Similar Papers

More From: Journal of Systems Architecture

Lead the way for us

Similar Papers

The Case for Hierarchical Deep Learning Inference at the Network Edge
Ghina Al-Atat ... James Gross
-
Ghina Al-Atat, et. al.Ghina Al-Atat ... James Gross
18 Jun 2023
18 Jun 2023

POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing
Ziyang Zhang ... Jie Liu
-
Ziyang Zhang, et. al.Ziyang Zhang ... Jie Liu
09 May 2023
09 May 2023

Fault-tolerant deep learning inference on CPU-GPU integrated edge devices with TEEs
Hongjian Xu ... Yuanlong Yu
Future Generation Computer Systems | VOL. 161
Hongjian Xu, et. al.Hongjian Xu ... Yuanlong Yu
20 Jul 2024
Future Generation Computer Systems | VOL. 161

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen ... Zhengang Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Xuan Shen, et. al.Xuan Shen ... Zhengang Li
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A high-performance dataflow-centric optimization framework for deep learning inference on the edge

Abstract

Talk to us

Similar Papers

More From: Journal of Systems Architecture