A flexible multicomputer algorithm for elementary matrix operations

Ralf Östermark

doi:10.1016/s0305-0548(99)00034-9

Abstract

In the present study we introduce and test a new flexible multicomputer (FM) algorithm for matrix calculations on a distributed memory computer. The FM-algorithm also performs matrix addition, subtraction, and scalar multiplication on both dense and sparse matrices. The FM-algorithm was designed to meet the need for a high-performance flexible software tool for implementing different parallel optimization algorithms. Special consideration has been taken to ensure the usability and portability of the algorithm. A preliminary flexibility test is conducted on an IBM SP2 (Cactus) machine. On the principal level, we will compare the FM-algorithm with another high-performance algorithm Summa and look at an improvement of Summa by combining it with the Strassen algorithm. On the empirical level, we will compare a chained version of the FM-algorithm with the parallel ScaLAPACK code in a set of huge matrix multiplications on a Cray T3E machine. Our results demonstrate that the FM-algorithm performs as well as the parallel ScaLAPACK code for dense matrices. FM is fully scalable for large, sparse matrices. The FM-algorithm is efficient with respect to sequential matrix multiplication. In contrast to ScaLAPACK, the fully scalable FM-algorithm is independent of mesh structure. Arbitrarily large matrices can be processed with a single processor. Scope and purpose In the paper we introduce a flexible multicomputer (FM) algorithm for elementary matrix calculations on a distributed memory computer. The algorithm is fully scalable in matrix multiplication, as shown in the study. The algorithm works as fast as the ScaLAPACK parallel matrix multiplication algorithm on a Cray T3E machine. A distinctive feature of the FM-algorithm is its independence of the number of available processors: the algorithm can process arbitrarily large matrices already on a single processor, contrary to the ScaLapack code. The reason for this possibility is that the original huge input matrices need not be represented in the CPU of the nodes. The FM-algorithm includes easy-to-use facilities for sparse matrix computations, in which blocks of zeros are eliminated from the calculation and communication tasks. The sparse matrix calculation is based on an elimination technique, where zero blocks in one input matrix and the corresponding block pairs from the other are excluded from data transmission and/or computations. The technique allows easy cell representation and facilitates information passing between nodes significantly. In comparison, the usual technique of representing each matrix cell by three numbers (the position coordinates and the cell value), leads to complicated data structures to be recognized and updated during computation. The possibility to represent specific nonzero patterns exactly, takes place at the expense of considerable communication overhead and, in most cases, little practical gain. The algorithm provides new avenues for high-performance computing, e.g., in the field of mathematical programming, where elementary matrix operations of huge dimensions are involved when solving intricate problems in engineering and economics. The FM-algorithm was designed both for architectures with shared disk memory and for architectures, where the data has to be transmitted between the processors. Naturally, in the latter system, the scalability of any matrix processing code is significantly reduced or eliminated since the communication overhead increases drastically with the problem size. However, even in this situation, the FM-algorithm needs only limited communication between nodes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A flexible multicomputer algorithm for elementary matrix operations

Abstract

Talk to us

Similar Papers

More From: Computers and Operations Research

Lead the way for us

Journal: Computers and Operations Research	Publication Date: Dec 29, 1999
Citations: 5

Similar Papers

Multi-Stage Memory Efficient Strassen's Matrix Multiplication on GPU
Arjun Gopala Krishnan ... Dhrubajyoti Goswami
-
Arjun Gopala Krishnan, et. al.Arjun Gopala Krishnan ... Dhrubajyoti Goswami
01 Dec 2021
01 Dec 2021

Minimizing Communication in Numerical Linear Algebra
Grey Ballard ... Oded Schwartz
SIAM Journal on Matrix Analysis and Applications | VOL. 32
Grey Ballard, et. al.Grey Ballard ... Oded Schwartz
01 Jul 2011
SIAM Journal on Matrix Analysis and Applications | VOL. 32

Automatic Reproduction of a Genius Algorithm: Strassen's Algorithm Revisited by Genetic Search
Seunghyun Oh ... Byung-Ro Moon
IEEE Transactions on Evolutionary Computation | VOL. 14
Seunghyun Oh, et. al. Seunghyun Oh ... Byung-Ro Moon
01 Apr 2010
IEEE Transactions on Evolutionary Computation | VOL. 14

Matrix-vector multiplication and triangular linear solver using GPGPU for symmetric positive definite matrices derived from elliptic equations
Thiago De Castro Martins ... Marcos De Sales Guerra Tsuzuki
-
Thiago De Castro Martins, et. al.Thiago De Castro Martins ... Marcos De Sales Guerra Tsuzuki
01 Nov 2012
01 Nov 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A flexible multicomputer algorithm for elementary matrix operations

Abstract

Talk to us

Similar Papers

More From: Computers and Operations Research