Abstract

In the present study we introduce and test a new flexible multicomputer (FM) algorithm for matrix calculations on a distributed memory computer. The FM-algorithm also performs matrix addition, subtraction, and scalar multiplication on both dense and sparse matrices. The FM-algorithm was designed to meet the need for a high-performance flexible software tool for implementing different parallel optimization algorithms. Special consideration has been taken to ensure the usability and portability of the algorithm. A preliminary flexibility test is conducted on an IBM SP2 (Cactus) machine. On the principal level, we will compare the FM-algorithm with another high-performance algorithm Summa and look at an improvement of Summa by combining it with the Strassen algorithm. On the empirical level, we will compare a chained version of the FM-algorithm with the parallel ScaLAPACK code in a set of huge matrix multiplications on a Cray T3E machine. Our results demonstrate that the FM-algorithm performs as well as the parallel ScaLAPACK code for dense matrices. FM is fully scalable for large, sparse matrices. The FM-algorithm is efficient with respect to sequential matrix multiplication. In contrast to ScaLAPACK, the fully scalable FM-algorithm is independent of mesh structure. Arbitrarily large matrices can be processed with a single processor. Scope and purpose In the paper we introduce a flexible multicomputer (FM) algorithm for elementary matrix calculations on a distributed memory computer. The algorithm is fully scalable in matrix multiplication, as shown in the study. The algorithm works as fast as the ScaLAPACK parallel matrix multiplication algorithm on a Cray T3E machine. A distinctive feature of the FM-algorithm is its independence of the number of available processors: the algorithm can process arbitrarily large matrices already on a single processor, contrary to the ScaLapack code. The reason for this possibility is that the original huge input matrices need not be represented in the CPU of the nodes. The FM-algorithm includes easy-to-use facilities for sparse matrix computations, in which blocks of zeros are eliminated from the calculation and communication tasks. The sparse matrix calculation is based on an elimination technique, where zero blocks in one input matrix and the corresponding block pairs from the other are excluded from data transmission and/or computations. The technique allows easy cell representation and facilitates information passing between nodes significantly. In comparison, the usual technique of representing each matrix cell by three numbers (the position coordinates and the cell value), leads to complicated data structures to be recognized and updated during computation. The possibility to represent specific nonzero patterns exactly, takes place at the expense of considerable communication overhead and, in most cases, little practical gain. The algorithm provides new avenues for high-performance computing, e.g., in the field of mathematical programming, where elementary matrix operations of huge dimensions are involved when solving intricate problems in engineering and economics. The FM-algorithm was designed both for architectures with shared disk memory and for architectures, where the data has to be transmitted between the processors. Naturally, in the latter system, the scalability of any matrix processing code is significantly reduced or eliminated since the communication overhead increases drastically with the problem size. However, even in this situation, the FM-algorithm needs only limited communication between nodes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call