CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

Chunyang Wang,Yuebin Bai,Desen Sun

doi:10.1109/tpds.2023.3276759

Abstract

With DNN turning into the backbone of AI cloud services and propelling the emergence of INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the indispensable components of cloud inference systems. Due to the conservative “one-task-at-a-time” working mode and deadline blindness of those accelerators, implementing multi-tenancy that aims to improve the cost-effectiveness and meet SLA requirements is intractable. Recent studies including the temporal and spatial approaches, employ manifold scheduling mechanisms and sophisticated architecture innovations to address the challenge. However, these researches either still neglect the deadline awareness or render inevitable and expensive hardware overheads such as switches and storage. In this paper, we present <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Cooperative and Deadline-aware Multi-Systolic-Array scheduling</i> (CD-MSA), a low-cost solution for the cloud inference that utilizes the real time mechanism and task-level parallelism to enable efficient multi-tenancy. Based on our preemptive multi-systolic-array accelerator architecture supporting the simultaneous task co-location, we first construct a fine-grained DNN execution model to lay the groundwork for the lightweight preemption. Second, we design a cooperative, deadline- and laxity-aware scheduler in conjunction with an efficient schedulability test method for better QoS guarantee without introducing additional hardware cost. Finally, to further promote the overall throughput, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">dynamic task fusion</i> , a software approach that fuses different tasks into the logically “multi-threading” tasks at runtime. We compare CD-MSA with several state-of-the-art researches across three multi-DNN workloads. The evaluation results show CD-MSA improves the latency-bounded throughput, SLA satisfaction rate and weighted system throughput by up to 62%, 63% and 27%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jul 1, 2023
Citations: 2

Similar Papers

Game-Based Thermal-Delay-Aware Adaptive Routing (GTDAR) for Temperature-Aware 3D Network-on-Chip Systems
Kun-Chih Chen
IEEE Transactions on Parallel and Distributed Systems | VOL. 29
Kun-Chih ChenKun-Chih Chen
01 Sep 2018
IEEE Transactions on Parallel and Distributed Systems | VOL. 29

Integrating Spatial and Temporal Approaches for Explaining Bicycle Crashes in High-Risk Areas in Antwerp (Belgium)
Hwachyi Wang ... Hans De Backer
Sustainability | VOL. 11
Hwachyi Wang, et. al.Hwachyi Wang ... Hans De Backer
09 Jul 2019
Sustainability | VOL. 11

A spatio-temporal concealment technique using boundary matching algorithm and mesh-based warping (BMA-MBW)
L Atzori ... C Perra
IEEE Transactions on Multimedia | VOL. 3
L Atzori, et. al.L Atzori ... C Perra
01 Jan 2001
IEEE Transactions on Multimedia | VOL. 3

A novel temporal concealment approach using a mesh-based motion compensation scheme
...
-
, et. al. ...
01 Sep 2000
01 Sep 2000

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems