Abstract

Abstract. We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL's Synergistic Processing Element than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60 % of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Highlights

  • Our work is motivated by the need for faster climate models in order to increase model resolution on current and future computing platforms and/or to increase the size of ensemble simulations

  • In order to illustrate our point, we studied the code of the FAMOUS climate model (Jones et al, 2005; Smith et al, 2008), a low-resolution version of the better known HadCM3 model developed by the UK Met Office, and used by the University of Oxford in the ClimatePrediction.net Millennium experiment

  • It is generally understood that future performance improvements of computing hardware will come mainly from increased use of parallel computing (Asanovic et al, 2009)

Read more

Summary

Introduction

Our work is motivated by the need for faster climate models in order to increase model resolution on current and future computing platforms and/or to increase the size of ensemble simulations. In order to illustrate our point, we studied the code of the FAMOUS climate model (Jones et al, 2005; Smith et al, 2008), a low-resolution version of the better known HadCM3 model developed by the UK Met Office, and used by the University of Oxford in the ClimatePrediction.net Millennium experiment. Positioned somewhat between a generic multi-core chip and a graphics processor, the CELL offers a good compromise between various hardware evolutions. It has a hybrid multi-core design that groups a generic PowerPC processor and several accelerators, the so called Synergistic Processing Elements, on a single chip. Our results and approach are in line with the work of Zhou et al (2009) who used the CELL processor to accelerate the computation of the radiation of the NASA GEOS-5 climate model

About FAMOUS
About the CELL processor
Profiling
Translating the code to C
Testing platform
The effects of rounding errors on the SPEs
The benchmark tests on the CELL processor
Intel-compatible processors
Graphics processors
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call