Abstract

Power and energy consumption are becoming key challenges for the supercomputers’ exascale race. HPC systems’ processors waist active power during communication and synchronization among the MPI processes in large-scale HPC applications. However, due to the time scale at which communication happens, transitioning into low-power states while waiting for the completion of each communication may introduce unacceptable overhead. In this article, we present COUNTDOWN, a run-time library for identifying and automatically reducing the power consumption of the CPUs during communication and synchronization. COUNTDOWN saves energy without penalizing the time-to-completion by lowering CPUs power consumption only during idle times for which power state transition overhead is negligible. This is done transparently to the user, without requiring labor-intensive and error-prone application code modifications, nor requiring recompilation of the application. We test our methodology on a production Tier-1 system. For the NAS benchmarks, COUNTDOWN saves between 6 and 50 percent energy, with a time-to-solution penalty lower than 5 percent. In a complete production—Quantum ESPRESSO—for a 3.5K cores run, COUNTDOWN saves 22.36 percent energy, with a performance penalty below 3 percent. Energy saving increases to 37 percent with a performance penalty of 6.38 percent, if the application is executed without communication tuning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call