CHAPTER 3 - System Performance Issues

Daniel E Lenoski,Wolf-Dietrich Weber

doi:10.1016/b978-1-55860-315-8.50008-7

Abstract

This chapter discusses hardware aspects of attaining scalable performance while maintaining a shared-memory paradigm. Scalable shared-memory performance relies on a scalable memory system. The fundamental performance of a memory system is given by its bandwidth and latency. The ideal memory system provides unit time latency access to all memory, and it does so without contention between processors. A memory system has finite latency and can support only a finite memory bandwidth. In particular, it is possible to add memory banks when processors are added. The interconnect system between the processors and memory cannot provide linearly increasing bandwidth to all memory without adding costs that grow larger than linearly. Memory bandwidth ultimately limits system scalability, but this limit does not appear to be significant for systems that use several hundred or even a few thousand processors. Systems of this size can take advantage of scalable direct interconnection networks to achieve sufficient bandwidth for memory access. The most serious bandwidth problems are caused by hot spot references, which are likely to be to synchronization variables. As far as overall performance is concerned, long memory latency is much more limiting than low memory bandwidth. Without good cache and memory locality, demand-driven cache misses can severely impact processor utilization.

Full Text