Abstract

Recently, renewable energy has emerged as an attractive means to reduce energy consumption costs for deep learning (DL) job processing in modern GPU-based clusters. In this paper, we propose a novel Renewable-Aware Frequency Scaling (RA-FS) approach for energy-efficient DL clusters. We have developed a real-time GPU core and memory frequency scaling method that finely tunes the training performance of DL jobs while maximizing renewable energy utilization. We introduce quantitative metrics: Deep Learning Job Requirement (DJR) and Deep Learning Job Completion per Slot (DJCS) to accurately evaluate the service quality of DL job processing. Additionally, we present a log-transformation technique to convert our non-convex optimization problem into a solvable one, ensuring the rigorous optimality of the derived solution. Through experiments involving deep neural network (DNN) model training jobs such as SqueezeNet, PreActResNet, and SEResNet on NVIDIA GPU devices like RTX3060, RTX3090, and RTX4090, we validate the superiority of our RA-FS approach. The experimental results show that our approach significantly improves performance requirement satisfaction by about 71% and renewable energy utilization by about 31% on average, compared to recent competitors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call