Abstract
Chemistry transport models are generally claimed to be well suited for massively parallel processing on distributed memory architectures since the arithmetic-to-communication ratio is usually high. However, this observation proves insufficient to account for an efficient parallel performance with increasing complexity of the model. The modeling of the local state of the atmosphere ensues very different branches of the modules' code and greater differences in the computational work load and, consequently, runtime of individual processors occur to a much larger extent during a time step than reported for meteorological models. Variable emissions, changes in actinic fluxes, and all processes associated with cloud modeling are highly variable in time and space and are identified to induce large load imbalances which severely affect the parallel efficiency. This is more so, when the model domain encompasses more heterogeneous meteorological or regional regimes, which impinge dissimilarly on simulations of atmospheric chemistry processes. These conditions hold for the EURAD model applied in this study, which covers the European continental scale as integration domain. Based on a master-worker configuration with a horizontal grid partitioning approach, a method is proposed where the integration domain of the individual processors is locally adjusted to accommodate for load imbalances. This ensures a minimal communication volume and data exchange only with the next neighbors. The interior boundary adjustments of the processors are combined with routine boundary exchange which is required each time step anyway. Two dynamic load balancing schemes were implemented and compared against a conventional equal area partition and a static load balancing scheme. The methods are devised for massively parallel distributed memory computers of both, Single and Multiple Instruction stream Multiple Data stream (SIMD, MIMD) types. A midsummer episode of highly elevated ozone concentrations over parts of Europe was taken as test case. The dynamic load balancing approaches were found to perform significantly better and reduce idle times of the processors considerably. The efficiency was raised from to 62% for a 128 processor configuration.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have