Performance and power for highly parallel systems

Georg Hager,Gerhard Wellein,Darren Kerbyson,Abhinav Vishnu

doi:10.1002/cpe.3761

Abstract

This special issue is the result of an open call for papers initiated after the minisymposium ‘Analysis and Modeling: Techniques and Tools’, conducted at the Society for Industrial and Applied Mathematics (SIAM) Conference on Parallel Processing in Scientific Computing in Savannah, GA, in February 2012. The minisymposium brought together tool developers, performance and power modeling experts, and application analysts to present the state of the art on performance analysis and modeling techniques. This unique combination of expertise is highly needed in a time where complex, hierarchical architectures are the standard for all highly parallel computer systems. Without a good grip on relevant performance limitations, any ptimization attempt is just a shot in the dark. Hence, it is crucial to fully understand the performance properties and bottlenecks that come about with clustered multicore/many-core, multisocket nodes. Another aspect of modern systems is the complicated interplay between power constraints and the need for compute performance, which leads to complicated trade-offs. The challenges ahead are many-fold as systems scale in size. While parallelism is increasing, memory systems, interconnection networks, storage, and uncertainties in programming models all add to the complexities. More rapid realization of energy savings will require significant increases in measurement resolution and optimization techniques. This special issue is focused on how performance and power properties of modern highly parallel systems can be analyzed using state-of-the-art modeling and analysis techniques and real-world applications and tools. G. Hager, J. Treibig, J. Habich, and G. Wellein 1 introduce simple but insightful analytic models for execution performance and energy consumption of multicore CPUs. Automatic dynamic voltage and frequency scaling is leveraged by the “Green Queue” framework presented by J. Peraza, A. Tiwari, M. Laurenzano, L. Carrington, and A. E. Snavely 2 in their article. They show that significant energy savings at low performance loss are in reach if dynamic voltage and frequency scaling is used in an application-aware manner. A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely 3 investigate the potential of job striping, a technique for co-locating HPC workloads with different characteristics on the same CPU chip, and demonstrate increased throughput and energy efficiency for a mix of typical simulation codes on a production cluster. The problem of how to deal with coarse-grained power measurements is tackled in the paper by H. Servat, G. Llort, J. Giménez, and J. Labarta 4. They present a tool that can derive fine-grained power and performance data for code with quickly alternating phases. The power usage and power variability of workloads on production supercomputers at Los Alamos National Laboratory are studied by S. Pakin, C. Storlie, M. Lang, R. E. Fields, E. E. Romero, C. Idler, S. Michalak, H. Greenberg, J. Loncaric, R. Rheinheimer, G. Grider, and J. Wendelberger in their paper 5. One of their central findings is that real power dissipation under real-world workloads is significantly lower than what the power infrastructure can handle, which opens interesting possibilities for saving cost via power capping. We think that this selection of papers is unique in providing several very different views on the problem of performance and power efficiency on present-day parallel machines from the core to the computing center level.

Full Text