IBM SP2 Research Articles

Three paradigms for distributed-memory parallel computation that free application programmer from details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using same algorithm on same hardware. All of paradigms -- parallel languages represented by Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by application programmer under any paradigm includes specification of data partitioning, corresponding to a geometrically simple decomposition of domain of PDE. Programming in SPMD style for PETSc library requires writing only routines that discretize PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to index mapping), and modest experimentation with rewriting loops to elucidate to compiler latent concurrency. Programming with CAPTools involves feeding same sequential implementation to CAPTools interactive parallelization system, and guiding source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of the state of practice for a scaled sequence of structured grid problems are given on three of most important contemporary high-performance platforms: IBM SP, SGI Origin 2000, and CRAYY T3E.

Read full abstract

Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multi-dimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. In this paper, we address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higher-dimensional sparse arrays. This way, programmers need not worry about low-level system details when developing sparse applications. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Our sparse libraries are built for array intrinsics of Fortran 90, and they include an extensive set of array operations such as CSHIFT, EOSHIFT, MATMUL, MERGE, PACK, SUM, RESHAPE, SPREAD, TRANSPOSE, UNPACK, and section moves. Our work, to our best knowledge, is the first work to give sparse and parallel sparse supports for array intrinsics of Fortran 90. In addition, we provide a complete complexity analysis for our sparse implementation. The complexity of our algorithms is in proportion to the number of nonzero elements in the arrays, and that is consistent with the conventional design criteria for sparse algorithms and data structures. Our current testbed is an IBM SP2 workstation cluster. Preliminary experimental results with numerical routines, numerical applications, and data-intensive applications related to OLAP (on-line analytical processing) show that our approach is promising in speeding up sparse matrix computations on both sequential and distributed memory environments if the programs are expressed with Fortran 90 array expressions.

Read full abstract

IBM SP2 Research Articles

Related Topics

Articles published on IBM SP2

Prefix Computations on Symmetric Multiprocessors

Parallel PIC plasma simulation through particle decomposition techniques

Parallel Compositional Reservoir Simulation on Clusters of PCs

Skew-insensitive Parallel Algorithms for Relational Join

Search engine case study: searching the web using genetic programming and MPI

Parallel Implementation of a Central Decomposition Method for Solving Large-Scale Planning Problems

Parallel Implementation of a Class of Adaptive Signal Processing Applications

Spatial and temporal data parallelization of the H.261 video coding algorithm

High-performance file I/O in Java: Existing approaches and bulk I/O extensions

Three parallel programming paradigms: comparisons on an archetypal PDE computation

Overlapping Communication and Computation with OpenMP and MPI

A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

Characterizing land surface anisotropy from AVHRR data at a global scale using high performance computing

Parallel processing of adaptive meshes with load balancing

Scheduling loops with partial loop-carried dependencies

Design of parallel algorithms for the single resource allocation problem

Multi-scale meshfree parallel computations for viscous, compressible flows

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines

Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

IBM SP2 Research Articles

Related Topics

Articles published on IBM SP2

Prefix Computations on Symmetric Multiprocessors

Parallel PIC plasma simulation through particle decomposition techniques

Parallel Compositional Reservoir Simulation on Clusters of PCs

Skew-insensitive Parallel Algorithms for Relational Join

Search engine case study: searching the web using genetic programming and MPI

Parallel Implementation of a Central Decomposition Method for Solving Large-Scale Planning Problems

Parallel Implementation of a Class of Adaptive Signal Processing Applications

Spatial and temporal data parallelization of the H.261 video coding algorithm

High-performance file I/O in Java: Existing approaches and bulk I/O extensions

Three parallel programming paradigms: comparisons on an archetypal PDE computation

Overlapping Communication and Computation with OpenMP and MPI

A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

Characterizing land surface anisotropy from AVHRR data at a global scale using high performance computing

Parallel processing of adaptive meshes with load balancing

Scheduling loops with partial loop-carried dependencies

Design of parallel algorithms for the single resource allocation problem

Multi-scale meshfree parallel computations for viscous, compressible flows

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines

Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search