Accelerating End-to-End Deep Learning Workflow with Codesign of Data Preprocessing and Scheduling

Yang Cheng,Yongqiang Xiong,Xi Fan,Peng Cheng,Jiaxin Lin,Ran Shu,Jinkun Geng,Zhiyuan Guo,Xinyi Yu,Wei Bai,Lei Qu,Dan Li,Binyao Jiang,Jianping Wu

doi:10.1109/tpds.2020.3047966

Abstract

In this article, we investigate the performance bottleneck of existing deep learning (DL) systems and propose DLBooster to improve the running efficiency of deploying DL applications on GPU clusters. At its core, DLBooster leverages two-level optimizations to boost the end-to-end DL workflow. On the one hand, DLBooster selectively offloads some key decoding workloads to FPGAs to provide high-performance online data preprocessing services to the computing engine. On the other hand, DLBooster reorganizes the computational workloads of training neural networks with the backpropagation algorithm and schedules them according to their dependencies to improve the utilization of GPUs at runtime. Based on our experiments, we demonstrate that compared with baselines, DLBooster can improve the image processing throughput by 1.4× – 2.5× and reduce the processing latency by 1/3 in several real-world DL applications and datasets. Moreover, DLBooster consumes less than 1 CPU core to manage FPGA devices at runtime, which is at least 90 percent less than the baselines in some cases. DLBooster shows its potential to accelerate DL workflows in the cloud.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerating End-to-End Deep Learning Workflow with Codesign of Data Preprocessing and Scheduling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jan 1, 2020
Citations: 39

Similar Papers

Three Reasons Why Artificial Intelligence Might Be the Radiologist's Best Friend.
Rick R Van Rijn ... Alberto De Luca
Radiology | VOL. 296
Rick R Van Rijn, et. al.Rick R Van Rijn ... Alberto De Luca
21 Apr 2020
Radiology | VOL. 296

Towards Understanding the Faults of JavaScript-Based Deep Learning Systems
Lili Quan ... Xiaofei Xie
-
Lili Quan, et. al.Lili Quan ... Xiaofei Xie
10 Oct 2022
10 Oct 2022

Clones in deep learning code: what, where, and why?
Hadhemi Jebnoun ... Md Saidur Rahman
Empirical Software Engineering | VOL. 27
Hadhemi Jebnoun, et. al.Hadhemi Jebnoun ... Md Saidur Rahman
08 Apr 2022
Empirical Software Engineering | VOL. 27

Improving deep learning performance for predicting large-scale geological {{CO}_{2}} sequestration modeling through feature coarsening
Bicheng Yan ... Dylan Robert Harp
Scientific Reports | VOL. 12
Bicheng Yan, et. al.Bicheng Yan ... Dylan Robert Harp
30 Nov 2022
Scientific Reports | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating End-to-End Deep Learning Workflow with Codesign of Data Preprocessing and Scheduling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems