Recent advances in cloud computing, large language models, and deep learning have started a race to create massive High-Performance Computing (HPC) centers worldwide. These centers increase in energy consumption proportionally to their computing capabilities; for example, according to the top 500 organization, the HPC centers Frontier, Aurora, and Super Computer Fugaku report energy consumptions of 22,786 kW, 38,698 kW, and 29,899 kW, respectively. Currently, energy-aware scheduling is a topic of interest to many researchers. However, as far as we know, this work is the first approach considering the idle energy consumption by the HPC units and the possibility of turning off unused units entirely, driven by a quantitative objective function. We found that even when turning off unused machines, the objectives of makespan and energy consumption still conflict and, therefore, their multi-objective optimization nature. This work presents empirical results for AGEMOEA, AGEMOEA2, GWASFGA, MOCell, MOMBI, MOMBI2, NSGA2, and SMS-EMOA. The best-performing algorithm is MOCell for the 400 real scheduling problem tests. In contrast, the best-performing algorithm is GWASFGA for a small-instance synthetic testbed.
Read full abstract