Abstract

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.

Highlights

  • This paper presents two strategies for parallelizing a legacy Fortran code while bolstering trust in the result: (1) a test-driven approach that verifies the numerical results and the performance relative to the original code and (2) an evolutionary approach that leaves much of the original code intact while offering a clear path to execution on multicore and many-core architectures in shared and distributed memory

  • Modern high-performance computing (HPC) software must be executed on multicore processors or many-core accelerators in shared or distributed memory

  • We looked at the scaling performance of parallel implementation of the Particle Representation Model (PRM) using Cray hardware and Fortran compiler which has excellent support for distributedmemory execution of coarray programs

Read more

Summary

Introduction

Achee and Carver [1] examined object extraction, which involves identifying candidate objects by analyzing the data flow in Fortran 77 code. They define a cohesion metric that they use to group. In Fortran, giving procedures explicit interfaces facilitates compiler checks on argument type, kind, and rank New capabilities they introduced included dynamic memory allocation. Most commercial software models for turbulent flow in engineering devices solve the Reynoldsaveraged Navier-Stokes (RANS) partial differential equations. Deriving these equations involves decomposing the fluid velocity field, u, into a mean part, u, and a fluctuating part, u󸀠:. T is the time coordinate; ui and uj are the ith and jth cartesian components of u; and xi and xj are the ith and jth cartesian components of the spatial coordinate x

10 Figure 1
Methodology
Extensible OO Test Suite
Coarray Parallelization
Results
Ease of Use
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call