Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution

Dianqin Wang,Eleanor Chu

doi:10.1023/a:1008151330872

Abstract

In this article, we study the effects of network topology and load balancing on the performance of a new parallel algorithm for solving triangular systems of linear equations on distributed-memory message-passing multiprocessors. The proposed algorithm employs novel runtime data mapping and workload redistribution methods on a communication network which is configured as a toroidal mesh. A fully parameterized theoretical model is used to predict communication behaviors of the proposed algorithm relevant to load balancing, and the analytical performance results correctly determine the optimal dimensions of the toroidal mesh, which vary with the problem size, the number of available processors, and the hardware parameters of the machine. Further enhancement to the proposed algorithm is then achieved through redistributing the arithmetic workload at runtime. Our FORTRAN implementation of the proposed algorithm as well as its enhanced version has been tested on an Intel iPSC/2 hypercube, and the same code is also suitable for executing the algorithm on the iPSC/860 hypercube and the Intel Paragon mesh multiprocessor. The actual timing results support our theoretical findings, and they both confirm the very significant impact a network topology chosen at runtime can have on the computational load distribution, the communication behaviors and the overall performance of parallel algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Jan 1, 1999
Citations: 29

Similar Papers

Impact of interconnection networks in a massively parallel FPGA architecture on a parallel reduction algorithm
Mouna Baklouti ... Mohamed Abid
-
Mouna Baklouti, et. al.Mouna Baklouti ... Mohamed Abid
01 Dec 2008
01 Dec 2008

A Three-Dimensional Cartesian Mesh Generation Algorithm Based on the GPU Parallel Ray Casting Method
Tiechang Ma ... Tianbao Ma
Applied Sciences | VOL. 10
Tiechang Ma, et. al.Tiechang Ma ... Tianbao Ma
19 Dec 2019
Applied Sciences | VOL. 10

Parallel Numeric Algorithms On Faster Computers

Scalable Computing Practice and Experience | VOL. 5

03 Jan 2001
Scalable Computing Practice and Experience | VOL. 5

Numerical analysis of parallel implementation of the reorthogonalized ABS methods
Szabina Fodor ... Zoltán Németh
Central European Journal of Operations Research | VOL. 27
Szabina Fodor, et. al.Szabina Fodor ... Zoltán Németh
18 Jun 2018
Central European Journal of Operations Research | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing