Abstract

Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing (HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge (HPCC) benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.

Highlights

  • High-Performance Computing (HPC) systems capability is increasingly constrained by their power consumption, and this will become worse due to the end of Dennard scaling[1,2]

  • Experimental results show that our approach achieves on average a 12.69% performance improvement over the conventional resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings

  • This paper has presented a novel resource allocation scheme for HPC workloads, targets memorybound data-parallel applications

Read more

Summary

Introduction

High-Performance Computing (HPC) systems capability is increasingly constrained by their power consumption, and this will become worse due to the end of Dennard scaling[1,2]. Other work utilizes a software-based resource scheduling approach by carefully determining the computation resource settings, such as the number of assigned compute nodes and processor frequency to match the workload to improve application performance under a power constraint[12,13]. A default strategy for the problem would be to allocate the minimal number of compute nodes to run one parallel process on a physical core. Our goal is to Tsinghua Science and Technology, June 2021, 26(3): 370–383 determine the optimal number of compute nodes and the processor clock frequency of each compute node to reduce the running time of memory-bound applications, while at the same time to cap the peak power and energy consumption as the default strategy

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call