Abstract

AbstractThe call for ever‐increasing model resolutions and physical processes in climate and weather models demands a continual increase in computing power. The IBM Cell processor's order‐of‐magnitude peak performance increase over conventional processors makes it very attractive to fulfill this requirement. However, the Cell's characteristics, 256 kB local memory per SPE and the new low‐level communication mechanism, make it very challenging to port an application. As a trial, we selected the solar radiation component of the NASA GEOS‐5 climate model, which: (1) is representative of column‐physics components (half of the total computational time), (2) has an extremely high computational intensity: the ratio of computational load to main memory transfers, and (3) exhibits embarrassingly parallel column computations. In this paper, we converted the baseline code (single‐precision Fortran) to C and ported it to an IBM BladeCenter QS20. For performance, we manually SIMDize four independent columns and include several unrolling optimizations. Our results show that when compared with the baseline implementation running on one core of Intel's Xeon Woodcrest, Dempsey, and Itanium2, the Cell is approximately 8.8x, 11.6x, and 12.8x faster, respectively. Our preliminary analysis shows that the Cell can also accelerate the dynamics component (∼25% total computational time). We believe these dramatic performance improvements make the Cell processor very competitive as an accelerator. Copyright © 2009 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call