AbstractEarth System Models (ESMs) are complex systems used in weather and climate studies generally built from different independent components responsible for simulating a specific realm (ocean, atmosphere, biosphere, etc.). To replicate the interactions between these processes, ESMs typically use coupling libraries that manage the synchronization and field exchanges between the individual components, which run in parallel as a Multi‐Program, Multiple‐Data application. As ESMs get more complex (increase in resolution, number of components, configurations, etc.), achieving the best performance when running in High‐performance Computing platforms has become increasingly challenging and of major concern. One of the critical bottlenecks is the load‐imbalance, where the fastest components will have to wait for the slower ones. Finding the optimal number of processing elements to assign to each of the multiple independent constituents to minimize the performance loss due to synchronizations and maximize the overall parallel efficiency is impossible without the right performance metrics, methodology, and tools. This paper presents the results of balancing multiple Coupled Model Intercomparison Project phase 6 configurations for the EC‐Earth3 ESM. We will show that intuitive approaches can lead to suboptimal resource allocations and propose new setups up to 25% fasters while reducing the computational cost by 72%. We prove that new methods are needed to deal with the load‐balance of ESMs and hope that our study will serve as a guide to optimize any other coupled system.
Read full abstract