Abstract

This chapter discusses hardware aspects of attaining scalable performance while maintaining a shared-memory paradigm. Scalable shared-memory performance relies on a scalable memory system. The fundamental performance of a memory system is given by its bandwidth and latency. The ideal memory system provides unit time latency access to all memory, and it does so without contention between processors. A memory system has finite latency and can support only a finite memory bandwidth. In particular, it is possible to add memory banks when processors are added. The interconnect system between the processors and memory cannot provide linearly increasing bandwidth to all memory without adding costs that grow larger than linearly. Memory bandwidth ultimately limits system scalability, but this limit does not appear to be significant for systems that use several hundred or even a few thousand processors. Systems of this size can take advantage of scalable direct interconnection networks to achieve sufficient bandwidth for memory access. The most serious bandwidth problems are caused by hot spot references, which are likely to be to synchronization variables. As far as overall performance is concerned, long memory latency is much more limiting than low memory bandwidth. Without good cache and memory locality, demand-driven cache misses can severely impact processor utilization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.