Abstract
Supercomputers are a powerful class of High Performance Computing (HPC) systems used for the most challenging science problems, including drug discovery, earthquake prediction, and climate change impacts. Scientific applications running on supercomputers use highly parallel constructs to iteratively process enormous, multi-dimensional data sets, such as billions of base pairs in a genome or grid units in the atmosphere. The largest HPC systems can consume as much electricity as a small town, so rising power costs and constraints are driving a growing focus on energy efficiency. Techniques that reduce energy consumption help to reduce constraints and costs on important research.Scientists running applications on HPC systems encounter a number of barriers using existing energy optimisation methods. Existing methods are typically aimed at parallel application developers and HPC system administrators, rather than application users. Users often do not have the required level of system administration and programming skills. Access to tuning parameter controls may require system privileges that are not available to typical users. The tuning process is often complex and time consuming, which can be a further deterrent when scientists naturally want to focus on their research.The complexity of optimising energy efficiency is driven by a range of factors. Optimisation methods must manage large search spaces covering the unique characteristics of a particular system and workload and their effect on performance and energy efficiency. Settings for optimum performance and energy efficiency can diverge, so trade-off options need to be identified that guide a suitable balance between energy use and performance. There is also inherent observational and prediction uncertainty in optimisation processes that needs to be considered.This thesis presents a number of significant advances in the field of energy efficiency optimisation of parallel applications:- The energy usage and performance impacts of bottlenecks in the system architecture and of user controllable settings are analysed.- Statistical and machine learning system models are developed that can be trained at low cost to accurately predict trade-off options using parameters that users can control.- A novel technique for assessing the impact of experimental error in Pareto-optimal trade-off analysis is presented.- The design and implementation of a new tool known as HPCProbe that prototypes the proposed optimisation approach are described in detail.- HPCProbe is used to provide a comprehensive experimental evaluation of the method for a collection of parallel kernels and scientific applications.These advances can allow HPC application users to make accurate performance and energy trade-off decisions, at low cost, and without specialised programming or system operations skills.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have