Abstract

This paper investigates the potential performance of hierachical, cache-consistent multiprocessors. We have developed a mean-value queueing model that considers bus latency, shared memory latency and bus interference as the primary sources of performance degradation in the system. A key feature of the model is its choice of high-level input parameters that can be related to application program characteristics. Another important property of the model is that it is computationally efficient. Very large systems can be analyzed in a matter of seconds. Results of the model show that system topology has an important effect on overall performance. We find optimal two-level, three-level and four-level topologies that distribute the bus traffic uniformly across all levels in the hierarchy. We provide processing power estimates for the optimal topologies, under a particular set of workload assumptions. For example, the optimal three-level topology supports 512 processors each with a peak processing rate of 4 MIPS, and provides an effective 1400 (1700) MIPS in processing power, if the buses operate at 20 (40) MHz. This result assumes 22% of the data references are to globally shared data and that the shared data is read on the average by a significant fraction of processors between write operations. Results of our study also indicate that for reasonably low cache miss rates (3% at level 0), and 20 MHz buses, the bus subnetwork saturates with processor speeds of 6–8 MIPS, at least for topologies of five or fewer levels. Finally, we present parametric results that indicate how performance is affected by one of the parameters that characterizes data sharing in the workload.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call