Abstract

The explosive increase in deep learning (DL) deployment has led GPU power usage to become a major factor in operational cost of modern HPC clusters. The complex mixture of DL processing, fluctuated renewable generation, and dynamic electricity price impedes the elaborate GPU power control, so as to lead an undesirable cost. However, most previous studies have been concerned only with the design of power management method using DL, and have not care about the cost caused by GPU power consumption for DL processing itself. This paper, as the opposite direction of these trends, proposes a real-time power controller called DeepPow-CTR for cost efficient DL processing in GPU based clusters. We design the GPU frequency scaling algorithm based on model predictive control (MPC), to delicately tune the DL power consumption in response to dynamic renewable generation and electricity price. At the same time, we avoid the unacceptable DL performance degradation by regulating memory-access / feed-forward / back-propagation (MFB) time per each minibatch data in deep neural network (DNN) model training. To solve the designed nonlinear MPC problem rapidly and accurately, we apply the damped Broyden-Fletcher-Goldfarb-Shanno (BFGS) based sequential quadratic programming (SQP) method to our DeepPow-CTR. Our experimental results on lab-scale testbed using real trace data of renewable generation and electricity price, demonstrate that the proposed DeepPow-CTR has superiority and practicality in terms of DL processing power cost and performance, compared to existing methods.

Highlights

  • Deep learning (DL) method which is a ‘‘representation learning’’, has emerged as a novel and powerful heuristic for solving complex problems in various industrial fields [1]–[3]

  • This paper proposes a real-time GPU power controller called DeepPow-CTR which achieves the cost efficient balancing of DL processing power and performance based on cluster owner-specified trade-off

  • Our sequential quadratic programming (SQP) based nonlinear model predictive control (MPC) formulation is able to achieve the proper balancing of DL processing power and performance, in response to electricity price and renewable generation

Read more

Summary

INTRODUCTION

Deep learning (DL) method which is a ‘‘representation learning’’, has emerged as a novel and powerful heuristic for solving complex problems in various industrial fields [1]–[3]. This paper proposes a real-time GPU power controller called DeepPow-CTR which achieves the cost efficient balancing of DL processing power and performance based on cluster owner-specified trade-off. The DeepPow-CTR tunes the DL processing performance by regulating memory-access / feed-forward / back-propagation (MFB) time per each minibatch data in deep neural network (DNN) model training. Through the integration of GPU frequency scaling algorithm and the SQP method for NMPC optimization in our DeepPow-CTR, we achieve the real-time GPU power control for cost efficient DL processing. D. NONLINEAR MPC FORMULATION FOR COST EFFICIENT DL PROCESSING IN CLUSTER The objective of the DeepPow-CTR is to find the real-time optimal GPU frequency setting for cost-efficient DL processing, in response to the renewable generation and electricity price. We present the detail of the SQP method used to get the optimal solution of the Problem II-D

BFGS BASED SQP METHOD FOR DL POWER
PERFORMANCE EVALUATION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call