The IBM ASTI optimizer provides the foundation for high-order transformations and automatic shared-memory parallelization in the latest IBM XL FORTRAN (XLF) compilers for RS/6000™ and PowerPC® uniprocessors and symmetric multiprocessors (SMPs), and for automatic distributed-memory parallelizationin the IBM XL High-Performance FORTRAN (XLHPF) compiler for the SP2™ distributed-memory multiprocessor. In this paper, we describe how the transformer component of the ASTI optimizer automatically selects high-order transformations for a given input program and a target uniprocessor, so as to improve utilization of the memory hierarchy (including cache and registers) and instruction-level parallelism. Our solution is centered on a quantitative approach in which optimization problems are formulated using quantitative cost models. The loop and data transformations currently employed by the ASTI transformer for optimizing uniprocessor performance are loop distribution, loop interchange, loop reversal, loop skewing, loop tiling/blocking (with compiler-selected tile sizes), loop fusion, unrolling of multiple loops (with compiler-selected unroll factors), and scalar replacement of selected array references. The design and initial implementation of the ASTI optimizer were completed during the 1991-1993 time period. To the best of our knowledge, the ASTI transformer is the first system to perform automatic selection of this wide range of transformations using a cost-based framework.
Read full abstract