AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU Clusters

Binlei Cai,Qin Guo,Xiaodong Dong

doi:10.1109/jiot.2022.3223381

Abstract

As Internet-of-Things (IoT) keeps growing, IoT-side intelligence services, such as intelligent personal assistant, healthcare surveillance, and smart home service, offload more and more complex machine learning (ML) inference workloads to cloud clusters. GPUs have been widely adopted to accelerate the execution of these ML inference workloads. However, current cluster management systems guarantee low tail latency for ML inferences using resource over-provisioning and small batch sizes, resulting in a serious waste of GPU resources and increasing the service costs greatly. To mitigate poor GPU utilization, we present AutoInfer, a self-driving cluster management system for ML inference serving in GPU clusters, where users express only the latency and accuracy requirements for their workloads without needing to specify the model variant, GPU provisioning strategy, and batching mechanism. AutoInfer extends the matrix factorization model to automatically recommend model variants for each new incoming ML inference workload with respect to latency and accuracy requirements, by identifying similarities to previously scheduled workloads. During runtime, AutoInfer leverages online telemetry data and deep reinforcement learning to adaptively adjust the GPU allocation and batch size to account for load variations while minimizing the effects on tail latency Service Level Objectives (SLOs). Testbed experiments show that AutoInfer is able to improve the average GPU utilization by up to 77% and keep the tail latency SLO violations under 5.5%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU Clusters

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal

Lead the way for us

Journal: IEEE Internet of Things Journal	Publication Date: Apr 1, 2023
Citations: 2

Similar Papers

Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving
Junyeol Yu ... Jongseok Kim
-
Junyeol Yu, et. al.Junyeol Yu ... Jongseok Kim
01 Feb 2023
01 Feb 2023

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems
Mehran Salmani ... Max Mühlhäuser
-
Mehran Salmani, et. al.Mehran Salmani ... Max Mühlhäuser
08 May 2023
08 May 2023

Machine Learning for Columnar High Energy Physics Analysis
Elliott Kauffman ... Oksana Shadura
EPJ Web of Conferences | VOL. 295
Elliott Kauffman, et. al.Elliott Kauffman ... Oksana Shadura
01 Jan 2024
EPJ Web of Conferences | VOL. 295

Pyramid: Enabling Hierarchical Neural Networks with Edge Computing
Qiang He ... Weifa Liang
-
Qiang He, et. al.Qiang He ... Weifa Liang
25 Apr 2022
25 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU Clusters

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal