Architecture independent parallel algorithm design: theory vs practice

Alexandros V Gerbessiotis

doi:10.1016/s0167-739x(01)00068-1

Abstract

We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Architecture independent parallel algorithm design: theory vs practice

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Journal: Future Generation Computer Systems	Publication Date: Jan 1, 2002
Citations: 5

Similar Papers

A flexible multicomputer algorithm for elementary matrix operations
Ralf Östermark
Computers and Operations Research | VOL. 27
Ralf ÖstermarkRalf Östermark
29 Dec 1999
Computers and Operations Research | VOL. 27

Architecture Aware Programming on Multi-Core Systems
M R ... S.R Sathe
International Journal of Advanced Computer Science and Applications | VOL. 2
M R, et. al.M R ... S.R Sathe
01 Jan 2010
International Journal of Advanced Computer Science and Applications | VOL. 2

Parallelization of a block tridiagonal solver in HPF on an IBM SP2
Auke Van Der Ploeg
-
Auke Van Der PloegAuke Van Der Ploeg
01 Jan 1998
01 Jan 1998

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems
M Krishnan ... J Nieplocha
-
M Krishnan, et. al.M Krishnan ... J Nieplocha
26 Apr 2004
26 Apr 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Architecture independent parallel algorithm design: theory vs practice

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems