Performance analysis and optimization on a parallel atmospheric general circulation model code

John D Farrara,John Z Lou

doi:10.1002/(sici)1096-9128(199806)10:7<549::aid-cpe365>3.0.co;2-w

Abstract

An analysis is presented of the primary factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on distributed-memory, massively parallel computer systems. Several modifications to the original parallel AGCM code aimed at improving its numerical efficiency, load-balance and single-node code performance are discussed. The impact of these optimization strategies on the performance on two of the state-of-the-art parallel computers, the Intel Paragon and Cray T3D, is presented and analyzed. It is found that implementation of a load-balanced FFT algorithm results in a reduction in overall execution time of approximately 45% compared to the original convolution-based algorithm. Preliminary results of the application of a load-balancing scheme for the physics part of the AGCM code suggest that additional reductions in execution time of 10–15% can be achieved. Finally, several strategies for improving the single-node performance of the code are presented, and the results obtained thus far suggest that reductions in execution time in the range of 35–45% are possible. © 1998 John Wiley & Sons, Ltd.

Full Text