Abstract
We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain cases and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We show significant speed-ups of ~100x, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3x and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38x. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Highlights
The current best explanation for the origin of our universe is the inflationary big bang scenario, where it is believed that a period of exponential expansion created the large flat empty universe we see today
This paper investigates the optimisation and modernisation of Modal, as part of an effort to accelerate it using Intel®Xeon PhiTM coprocessors
The existing MPI-level parallelism in the original code is not sufficient to enable efficient utilisation of this hardware, and we show that moving to a hybrid MPI/OpenMP implementation can significantly improve performance
Summary
The current best explanation for the origin of our universe is the inflationary big bang scenario, where it is believed that a period of exponential expansion created the large flat empty universe we see today. The primary obstacle to naïve estimation of the bispectrum is that for the CMB it is 5 dimensional and would require O(1022) floating point operations to calculate, which is challenging for the world’s largest supercomputers. This can be overcome by using separable approximations for the bispectra, the projection of the primordial bispectra forward to the time of observation remains a major obstacle to measurement.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have