HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Hao Mo,Songfu Tan,Lei Shi,Ligu Zhu,Suping Wang

doi:10.3390/electronics12010240

Abstract

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective manner. Novel autoscaling mechanisms that greedily minimize the number of service instances while ensuring SLO compliance are helpful. However, we find that it is not adequate to guarantee cost effectiveness across heterogeneous GPU hardware, and this does not maximize resource utilization. In this paper, we propose HetSev to address these challenges by incorporating heterogeneity-aware autoscaling and resource-efficient scheduling to achieve cost effectiveness. We develop an autoscaling mechanism which accounts for SLO compliance and GPU heterogeneity, thus provisioning the appropriate type and number of instances to guarantee cost effectiveness. We leverage multi-tenant inference to improve GPU resource utilization, while alleviating inter-tenant interference by avoiding the co-location of identical ML instances on the same GPU during placement decisions. HetSev is integrated into Kubernetes and deployed onto a heterogeneous GPU cluster. We evaluated the performance of HetSev using several representative ML models. Compared with default Kubernetes, HetSev reduces resource cost by up to 2.15× while meeting SLO requirements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jan 3, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections
Sanghita Barui ... K S Rajmohan
Scientific Reports | VOL. 12
Sanghita Barui, et. al.Sanghita Barui ... K S Rajmohan
30 Sep 2022
Scientific Reports | VOL. 12

PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters
Jinghui Zhang ... Zhiang Wu
Neurocomputing | VOL. 555
Jinghui Zhang, et. al.Jinghui Zhang ... Zhiang Wu
04 Aug 2023
Neurocomputing | VOL. 555

Optimizing distributed training deployment in heterogeneous GPU clusters
Xiaodong Yi ... Chuan Wu
-
Xiaodong Yi, et. al.Xiaodong Yi ... Chuan Wu
23 Nov 2020
23 Nov 2020

Data Partitioning Strategy of GPU Heterogeneous Clusters Based on Learning
Jianjiang Li ... Wei Chen
International Journal of Grid and Distributed Computing | VOL. 9
Jianjiang Li, et. al.Jianjiang Li ... Wei Chen
30 Sep 2016
International Journal of Grid and Distributed Computing | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Abstract

Talk to us

Similar Papers

More From: Electronics