Simulating the transfer of mass between particles is not straightforwardly parallelized because it involves the calculation of the influence of many particles on each other. Engdahl et al. (2019) intuited that the number of matrix operations used for mass transfer grows quadratically with the number of particles, so that dividing the domain geometrically into sub-domains will give speed and memory advantages, even on a single processing thread. Those authors also showed the speed scalability of several one-dimensional examples on multiple cores. Here, we extend those results for more general cases, both in terms of spatial dimensions and algorithmic implementation. We show that there is an optimal subdivision scheme for naive, full-matrix calculations on a multi-processor, or multi-threading shared-memory machine. A similar sparse-matrix implementation that also uses row-and-column-sum normalization often greatly reduces the memory requirements. We also introduce a completely new mass transfer algorithm that uses a non-geometric domain decomposition and only matrix row-sum normalization. This allows the mass-transfer “matrix” to be constructed and solved one row at a time in parallel, so it is faster and vastly more memory efficient than previous methods, but requires more care for suitable accuracy.
Read full abstract