Abstract
Algorithms are often parallelized based on data dependence analysis manually or by means of parallel compilers. Some vector/matrix computations such as the matrix-vector products with simple data dependence structures (data parallelism) can be easily parallelized. For problems with more complicated data dependence structures, parallelization is less straightforward. The data dependence graph is a powerful means for designing and analyzing parallel algorithms. However, for sparse matrix computations, parallelization based on solely exploiting the existing parallelism in an algorithm does not always give satisfactory results. For example, the conventional Gaussian elimination algorithm for the solution of a tri-diagonal system is inherently sequential, so algorithms specially for parallel computation has to be designed. After briefly reviewing different parallelization approaches, a powerful graph formalism for designing parallel algorithms is introduced. This formalism will be discussed using a tri-diagonal system as an example. Its application to general matrix computations is also discussed. Its power in designing parallel algorithms beyond the ability of data dependence analysis is shown by means of a new algorithm called ACER (Alternating Cyclic Elimination and Reduction algorithm).
Highlights
Efficient parallelization of a computational problem requires resolving the tradeoffs between maximal load balance and minimal communication overhead
The fundamental factor limiting load balance is the parallelism in an algorithm
Because it is known that the conventional Gaussian elimination process for the tri-diagonal system has very little parallelism, so the problem of solving a tridiagonal system of linear equations is a good example for demonstrating the power of the graph formalism
Summary
Efficient parallelization of a computational problem requires resolving the tradeoffs between maximal load balance and minimal communication overhead. We have introduced a more general graph formalism for sparse matrix computations [10,11] which uses a directed graph and the basic operation is the elimination of one edge (in contrast to the elimination of a node with all the edges incident on it in the conventional elimination graph). It is based on the concept of viewing a sparse matrix computation problem as a graph transformation problem: the initial graph representing the initial matrix is transformed into a terminal configuration representing the desired final matrix form (e.g., a triangular matrix, or a diagonal matrix, etc.).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have