Abstract

We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Next, we analyze an algorithm by making a precise count of the number of times the algorithm performs each type of operation. We have used this methodology to evaluate many of the parallel sorting algorithms proposed in the literature. Of these, we selected the three most promising, Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort, and implemented them on the connection Machine model CM-2. This paper analyzes the three algorithms in detail and discusses the issues that led us to our particular implementations. On the CM-2 the predicted performance of the algorithms closely matches the observed performance, and hence our methodology can be used to tune the algorithms for optimal performance. Although our programs were designed for the CM-2, our conclusions about the merits of the three algorithms apply to other parallel machines as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call