Abstract
A previously reported parallel performance model for angular domain decomposition (ADD) of the discrete ordinates approximation for solving multidimensional neutral particle transport problems is revisited for stronger validation. Three communication schemes, native MPI, the bucket algorithm, and the distributed bucket algorithm, are included in the validation exercise that is successfully conducted on a Beowulf cluster. The parallel component of the parallel performance model is largely independent of the communication scheme, in contrast with the communication component that is strongly dependent on the global reduce algorithm. Correct trends for each component and each communication scheme are measured for the Arbitrarily High Order Transport (AHOT) code, thus validating the performance models. Furthermore, extensive experiments illustrate the superiority of the bucket algorithm, in the sense that it incurs a smaller communication penalty compared to the native MPI and distributed bucket algorithms. The primary question addressed in this work is for a given problem size, which domain decomposition scheme, angular or spatial, is best suited to parallelize discrete ordinates methods on a specific computational platform? We address this question for three-dimensional applications via parallel performance models for the abovementioned ADD, and a previously constructed and validated spatial domain decomposition (SDD) model. The constructed parallel performance models include parameters specifying the problem size and system performance. We conclude that for large problems the parallel component dwarfs the communication component even on moderately large numbers of processors. The main advantages of SDD are (a) scalability to higher numbers of processors of the order of the number of computational cells; (b) smaller memory requirement; (c) better performance than ADD on high-end platforms and large number of processors. On the other hand, the main advantages of ADD are (a) perfect load balance; (b) simple implementation, even on unstructured grids; (c) better performance than SDD on medium- and low-end platforms and large number of discrete ordinates. It follows that programmers and users of discrete ordinates codes must carefully select the appropriate domain decomposition method for the class of problems and multiprocessor platforms they wish to target.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have