Abstract

Reducing energy consumption without affecting computational performance is a significant research driver in computer engineering. The Partitioned Global Address Space (PGAS) programming model provides a global address space for ease-of-use while providing locality-awareness for efficient execution. For symmetric multiprocessor (SMP) clusters, PGAS locality-awareness offers opportunities for intelligent energy management due to the inherent latencies in the interconnection network fabric. In this paper, the second in a series that explores PGAS power optimization, we present a detailed examination of how data locality-awareness can be exploited to improve the energy consumption of workload sharing on SMP clusters. We present a systematic methodology for analyzing a given algorithm's potential for locality-aware power optimization. We show how to automate the code modifications for an iterative stencil loop algorithm using a profiler that analyzes the algorithms' non-local access patterns and optimizes the code for thread-local data pre-fetch. We explore the use of dynamic voltage frequency scaling (DVFS) in conjunction with time-based optimizations and provide an extensible and precise power measurement framework for generating the experimental results. We compare optimized and CPU-default approaches to DVFS application at the program-execution and processor architecture level. We show how the latency of the cluster's interconnection network can impact the opportunities for power optimization by comparing the programs' energy/time behavior when the cluster nodes are connected by Ethernet or Infiniband network fabrics. For our representative iterative stencil loop algorithm, locality aware power optimizations show significant improvements in energy efficiency over native code running in default processor power modes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call