Abstract
The efficient utilization of high-performance computing (HPC) system resources under rigorous electric power budget or I/O workload constraints is among the most important goals set by system operators to deal with the demanding requirements of application users. In most cases, the effective utilization of CPU and memory devices, which is tightly linked to electric power consumption, is a counterpart metric of I/O activities in most HPC jobs. Towards higher utilization of HPC systems under strict electric power consumption and I/O activity management constrains, we must be careful to prevent hot-spots from developing in power consumption or I/O operations that could lead to unstable system operations by exceeding electric power supply or I/O subsystem capabilities. One of the feasible solutions is arranging compute node assignment not to have such hot-spots in electric power or I/O operations. To address this issue, we analyzed vast amounts of log data collected from the K computer and found strong positive correlations between CPU and memory device utilization rates and electric power consumption levels. On the one hand, we also observed strong negative correlations and reduced electric power consumption in relation to file I/O activities in a specific compute node-layout, thereby indicating unique characteristics in some I/O-intensive HPC jobs in the node-layout. Our investigation revealed that HPC jobs could be divided into two groups when classified in terms of required electric power — jobs consuming high electric power levels and I/O-intensive jobs with reduced electric power levels. Then, we achieved high levels of accuracy when classifying jobs in terms of electric power levels using RandomForestClassifier among multiple machine learning classification models provided from scikit-learn . The classification can prevent us from hot-spots in electric power consumption in compute node assignment in job scheduling. Thus we demonstrated efficient job classifications towards power-aware system operations in the supercomputer Fugaku, which is the successor to the K computer.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.