The performance achieved by a parallel architecture over a complete application is determined by the combination of the hardware and software modules. When we talk about hardware we mean node processing power and network parameters, while software entails all from the optimization capabilities of the compiler to the high level programming model. They interact in a non-simple way delivering variable results for different problem sizes and making the task of predicting performance a very difficult one. Performance is predictable once, given an algorithm, you can parameterize it in terms of floating-point operations needed, bandwidth and latency requirements, granularity of the problem itself and few parameters, obviously machine dependent. We attack the issue of predicting performance for a large class of regular synchronous problems on rectangular grids (only 2D in this paper). The aim of the paper is to determine, by means of dedicated small benchmarking kernels, all the machine dependent parameters. These will be used to predict and compare, over a very wide range of data set sizes, the performances of the Connection Machine CM-5, the Cray T3D and the IBM SP2 for a simple but complete application like the Conjugate Gradient solution for the Poisson equation. We show that the parameterization can be done quite accurately for all of the studied platforms, thus predicting, from measurements performed on extremely simple kernels and some algorithmic understanding, the behavior of an MPP over a very wide range of parameters. We argue in favor of adopting this methodology to produce meaningful benchmarks of MPP platforms. © 1998 John Wiley & Sons, Ltd.
Read full abstract