Abstract

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storage of size O(7n) for multiplying 2n× 2nmatrices. We present a modified formulation in which the working storage requirement is reduced to O(4n). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.

Highlights

  • Tensor products (Kronecker products) have been used to model algorithms with a recursive computational structure that occur in application areas such as digital signal processing [6, 15], image processing [16], linear system design [5 ~. and statistics [7]

  • A programming methodology based on tensor products has been successfully used to design and implement highperformance algorithms to compute fast Fourier Transforms (FFT) [12, 14] andmatrixmultiplica

  • We describe the tensor product formulation of Strassen's matrix multiplication algorithm, and discuss program generation for shared memory vector processors such as the Cray Y-MP

Read more

Summary

INTRODUCTION

Tensor products (Kronecker products) have been used to model algorithms with a recursive computational structure that occur in application areas such as digital signal processing [6, 15], image processing [16], linear system design [5 ~. and statistics [7]. A set of multilinear algebra operations such as tensor product and matrix multiplication are used to express block recursive algorithms. These algebraic operations can be systematically translated into high-level programming language constructs such as sequential composition, iteration, and parallel/vector operations. Tensor product formulas representing an algorithm can be algebraically manipulated to restructure the computation to achieve different performance characteristics. We describe the tensor product formulation of Strassen's matrix multiplication algorithm, and discuss program generation for shared memory vector processors such as the Cray Y-MP. A formulation of Strassen's algorithm using this notation is presented in Section :3, along with a discussion on how the formulation can be modified to achieve reduction in working storage.

AN OVERVIEW OF THE TENSOR PRODUCT NOTATION
A TENSOR PRODUCT FORMULATION OF STRASSEN15 ALGORITHM
Combining Breadth-First and Depth-First Evaluations
Matrix Storage in Main Memory
CODE GENERATION FOR VECTOR PROCESSORS
Memory Management for Depth-First Evaluation
PERFORMANCE RESULTS ON THE CRAY Y-MP
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call