Innovative Algorithms for Extreme Scale Computing

Frédéric Magoulès,Lorna Smith,Mark Parsons

doi:10.1177/1094342015576772

Abstract

For the past thirty years, the need for ever greater supercomputer performance has driven the development of many computing technologies which have subsequently been exploited in the mass market. Delivering an exaflop (or a million million million calculations per second) by the end of this decade is the challenge that the supercomputing community worldwide has set itself. Developing techniques and solutions which address the most difficult challenges that computing at the exascale can provide is a real challenge. Equipment vendors, programming tools providers, academics, and end users must all work together to build and to develop the development and debugging environment, algorithms and libraries, user tools, and the underpinning and cross-cutting technologies required to support the execution of applications at the exascale. This special issue of the journal is dedicated to advances in high performance computing in engineering and the way to exascale. It contains some papers which have been selected from the Exascale Applications and Software Conference (EASC2013) held on 9–11 April 2013 in Edinburgh, United Kingdom. The issue contains five papers, illustrates the recent advances made in the exascale path and covers algorithms, implementations and applications to solve large scale engineering problems. The first paper, by Reverdy et al., reports the realisation of the first cosmological simulations on the scale of the whole observable universe. The paper first focuses on the numerical aspects of two new simulations. In practice, each one of these simulations has evolved 550 billion dark matter particles in an adaptive mesh refinement grid, and one of the new simulations has pushed back the total number of grid points from 2000 billion for the L Cold Dark Matter (L CDM) model to 2200 billion due to the formation of a larger number of structures. The authors highlight the optimisations and adjustments required to run such a set of simulations and then summarise some important lessons learnt toward future exascale computing projects. Numerical examples illustrate the effectiveness of the procedure on 4752 nodes of the Curie Bull supercomputer. The second paper, by Mozdzynski et al., presents the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS) enhanced to use Fortran2008 coarrays to overlap computation and communication in the context of OpenMP parallel regions. Today ECMWF runs a 16 km global T1279 operational weather forecast model using 1536 cores. Following the historical evolution in resolution upgrades, ECMWF could expect to be running a 2.5 km global forecast model by 2030 on an exascale system that should be available and hopefully affordable by then. To achieve this would require IFS to run efficiently on about 1000 times the number of cores it uses today. This is a significant challenge that is addressed in this paper, where, after implementing an initial set of improvements, ECMWF is demonstrated running a 10 km global model efficiently on over 40,000 cores on the HECToR Cray XE6 supercomputer. The third paper, by Gray et al., describes a multiGPU implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. The authors describe the methodology in augmenting the original CPU version with GPU functionality in a maintainable fashion. After presenting several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy, they describe how to implement particles within the fluid in such a way as to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each timestep. Numerical results show that the application demonstrates excellent scaling to at least 8192 GPUs in parallel (the largest system tested at the time of writing) on the Titan Cray XK7 supercomputer. Exascale computers are expected to have highly hierarchical architectures with nodes composed of multiple core processors (CPU) and accelerators (GPU). The different programming levels generate new difficulties and algorithms issues. The paper written by Magoules et al., presents Alinea, which stands for Advanced LINEar Algebra, a library well suited for hybrid CPU/

Full Text