This paper presents a general methodology for the communication-efficient parallelization of graph algorithms using the divide-and-conquer approach and shows that this class of problems can be solved in cluster environments with good communication efficiency. Specifically, the first practical parallel algorithm, based on a general coarse-grained model, for finding Hamiltonian paths in tournaments is presented. On any such parallel machines, this algorithm uses only (3log p+1), where p is the number of processors, communication rounds, which is independent of the tournament size, and can reuse the existing linear-time algorithm in the sequential setting. For theoretical completeness, the algorithm is revised for fine-grained models, where the ratio of computation and communication throughputs is low or the local memory size, $$O(\frac{N}{p})$$ , of each individual processor is extremely limited $$(\frac{N}{p} \ge p^\epsilon,$$ for any $$\epsilon > 0)$$ , solving the problem with O(log p) communication rounds, while the hidden constant grows with the scalability factor 1/?. Experiments have been carried out on a Linux cluster of 32 Sun Ultra5 computers and an SGI Origin 2000 with 32 R10000 processors. The algorithm performance on the Linux Cluster reaches 75% of the performance on the SGI Origin 2000 when the tournament size is about one million.
Read full abstract