Distributed-memory parallel processors (DMPPs) can deliver peak performance higher than vector supercomputers while promising a better cost-performance ratio. Programming, however, is harder than on traditional vector systems, especially when problems necessitating unstructured solution methods are considered. A class of such applications, with large resource requirements, is the numerical solution of partial differential equations (PDEs) on nonuniformly refined three-dimensional finite element discretizations. Porting an application of this class from vector and shared-memory parallel machines to DMPPs involves some fundamental algorithm changes, such as grid decomposition, mapping, and coloring strategies. In addition, no standardized language interface is available to ease the efficient parallelization and porting among DMPPs and between vector computers and DMPPs. This article describes how PILS-an existing package for the iterative solution of large unstructured sparse linear systems of equations on vector computers-was ported to DMPPs, using the parallelizing Fortran compiler Oxygen. Two DMPPs, namely an Intel Paragon and a Fujitsu AP1000, were used to evaluate the performance of the generated parallel program quantitatively. The results indicate how an application should be designed to be portable among supercomputers of different architecture. Several language and architecture features are essential for such a porting process and ease the parallelization of similar applications drastically.
Read full abstract