We present provably efficient parallel algorithms for sweep scheduling, which is a commonly used technique in radiation transport problems, and involves inverting an operator by iteratively sweeping across a mesh from multiple directions. Each sweep involves solving the operator locally at each cell. However, each direction induces a partial order in which this computation can proceed. On a distributed computing system, the goal is to schedule the computation, so that the length of the schedule is minimized. Due to efficiency and coupling considerations, we have an additional constraint, namely, a mesh cell must be processed on the same processor along each direction. Problems similar in nature to sweep scheduling arise in several other applications, and here we formulate a combinatorial generalization of this problem that captures the sweep scheduling constraints,and call it the generalized sweep scheduling problem. Several heuristics have been proposed for this problem; see [S. Pautz, An algorithm for parallel S n sweeps on unstructured meshes, Nucl. Sci. Eng. 140 (2002) 111–136; S. Plimpton, B. Hendrickson, S. Burns, W. McLendon, Parallel algorithms for radiation transport on unstructured grids, Super Comput. (2001)] and the references therein; but none of these have provable worst case performance guarantees. Here we present a simple, almost linear time randomized algorithm for the generalized sweep scheduling problem that (provably) gives a schedule of length at most O ( log 2 n ) times the optimal schedule for instances with n cells, when the communication cost is not considered, and a slight variant, which coupled with a much more careful analysis, gives a schedule of (expected) length O ( log m log log log m ) times the optimal schedule for m processors. These are the first such provable guarantees for this problem. The algorithm can be extended with an additional multiplicative factor in the case when we have inter-processor communication latency, in the models of Rayward-Smith [UET scheduling with inter-processor communication delays, Discrete Appl. Math. 18 (1) (1987) 55–71] and Hwang et al. [Scheduling precedence graphs in systems with inter-processor communication times, SIAM J. Comput. 18(2) (1989) 244–257]. Our algorithms are extremely simple, and use no geometric information about the mesh; therefore, these techniques are likely to be applicable in more general settings. We also design a priority based list schedule using these ideas, with the same theoretical guarantee, but much better performance in practice; combining this algorithm with a simple block decomposition also lowers the overall communication cost significantly. Finally, we perform a detailed experimental analysis of our algorithm. Our results indicate that the algorithm compares favorably with the length of the schedule produced by other natural and efficient parallel algorithms proposed in the literature [S. Pautz, An Algorithm for parallel S n sweeps on unstructured meshes, Nucl. Sci. Eng. 140 (2002) 111–136; S. Plimpton, B. Hendrickson, S. Burns, W. McLendon, Parallel algorithms for radiation transport on unstructured grids, Super Comput. (2001)].
Read full abstract