Abstract

In scalable multiprocessor architectures, the times required for a processor to access various portions of memory are different. In this paper, we consider how this characteristic affects performance by comparing it to the ideal but unrealizable case in which the access times to all memory modules can be kept constant, even as the number of processors is increased. We examine several application kernels to investigate how well they would execute on various instances of NUMA systems with a hierarchical memory structure. The results of our analytic model show that access locality is much more important in NUMA architectures than it is in UMA architectures. The extent of the performance penalty of non-local memory accesses depends on the variability in access times to various parts of shared memory, as well as on the amount of congestion in the interconnection network that provides access to remote memory modules. In the applications we examined, we found that it is possible to partition and locate both the data and the computation in such a way that reasonable speedups can be achieved on NUMA systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.