OptimML: Joint Control of Inference Latency and Server Power Consumption for ML Performance Optimization

Guoyu Chen,Xiaorui Wang

doi:10.1145/3661825

Abstract

Power capping is an important technique for high-density servers to safely oversubscribe the power infrastructure in a data center. However, power capping is commonly accomplished by dynamically lowering the server processors’ frequency levels, which can result in degraded application performance. For servers that run important machine learning (ML) applications with Service-Level Objective (SLO) requirements, inference performance such as recognition accuracy must be optimized within a certain latency constraint, which demands high server performance. In order to achieve the best inference accuracy under the desired latency and server power constraints, this paper proposes OptimML, a multi-input-multi-output (MIMO) control framework that jointly controls both inference latency and server power consumption, by flexibly adjusting the machine learning model size (and so its required computing resources) when server frequency needs to be lowered for power capping. Our results on a hardware testbed with widely adopted ML framework (including PyTorch, TensorFlow, and MXNet) show that OptimML achieves higher inference accuracy compared with several well-designed baselines, while respecting both latency and power constraints. Furthermore, an adaptive control scheme with online model switching and estimation is designed to achieve analytic assurance of control accuracy and system stability, even in the face of significant workload/hardware variations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OptimML: Joint Control of Inference Latency and Server Power Consumption for ML Performance Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems

Lead the way for us

Journal: ACM Transactions on Autonomous and Adaptive Systems	Publication Date: May 7, 2024
License type: mit

Similar Papers

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems
Xuechao Wei ... Tao Wang
-
Xuechao Wei, et. al.Xuechao Wei ... Tao Wang
01 Jan 2017
01 Jan 2017

A measurement-based power consumption model of a server by considering inlet air temperature
Chaoqiang Jin ... Chao Zeng
Energy | VOL. 261
Chaoqiang Jin, et. al.Chaoqiang Jin ... Chao Zeng
16 Aug 2022
Energy | VOL. 261

Unsupervised Power Modeling of Co-Allocated Workloads for Energy Efficiency in Data Centers
Juan C Salinas-Hilburg ... José M Moya
-
Juan C Salinas-Hilburg, et. al.Juan C Salinas-Hilburg ... José M Moya
01 Jan 2015
01 Jan 2015

Reinforcement-Learning-Empowered MLaaS Scheduling for Serving Intelligent Internet of Things
Heyang Qin ... Sanjay Padhi
IEEE Internet of Things Journal | VOL. 7
Heyang Qin, et. al.Heyang Qin ... Sanjay Padhi
10 Jan 2020
IEEE Internet of Things Journal | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OptimML: Joint Control of Inference Latency and Server Power Consumption for ML Performance Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems