Dynamic load/propagate/store for data assimilation with particle filters on supercomputers

Sebastian Friedemann,Kai Keller,Yen-Sen Lu,Bruno Raffin,Leonardo Bautista-Gomez

doi:10.1016/j.jocs.2024.102229

Abstract

Several ensemble-based Data Assimilation (DA) methods rely on a propagate/update cycle, where a potentially compute intensive simulation code propagates multiple states for several consecutive time steps, that are then analyzed to update the states to be propagated for the next cycle. In this paper we focus on DA methods where the update can be computed by gathering only lightweight data obtained independently from each of the propagated states. This encompasses particle filters where one weight is computed from each state, but also methods like Approximate Bayesian Computation (ABC) or Markov Chain Monte Carlo (MCMC). Such methods can be very compute intensive and running efficiently at scale on supercomputers is challenging. This paper proposes a framework based on an elastic and fault-tolerant runner/server architecture minimizing data movements while enabling dynamic load balancing. Our approach relies on runners that load, propagate and store particles from an asynchronously managed distributed particle cache permitting particles to move from one runner to another in the background while particle propagation proceeds. The framework is validated with a bootstrap particle filter with the WRF simulation code. We handle up to 2555 particles on 20,442 compute cores. Compared to a file-based implementation, our solution spends up to 2.84 less resources (cores×seconds) per particle.

Full Text