Abstract

Shared memory applications running transparently on top of NUMA architectures often face severe performance problems due to bad data locality and excessive remote memory accesses. Optimizations with respect to data locality are therefore necessary, but require a fundamental understanding of an application's memory access behavior. The information necessary for this cannot be obtained using simple code instrumentation due to the implicit nature of the communication handled by the NUMA hardware, the large amount of traffic produced at runtime, and the fine access granularity in shared memory codes. In this paper an approach to overcome these problems and thereby to enable an easy and efficient optimization process is presented. Based on a low-level hardware monitoring facility in coordination with a comprehensive visualization tool, it enables the generation of memory access histograms capable of showing all memory accesses across the complete address space of an application's working set. This information can be used to identify access hot spots, to understand the dynamic behavior of shared memory applications, and to optimize applications using an application specific data layout resulting in significant performance improvements.

Highlights

  • Tao et al / Memory access behavior analysis of NUMA-based shared memory programs chitecture, a software framework has been developed within the SMiLE project (Shared Memory in a LAN like Environment) which closes the semantic gap between the global view of the distributed physical memories in NUMA architectures and the global virtual memory abstraction required by shared memory programming models [8,17]

  • 5.20 s 3.04 cations will be penalized by excessive remote memory accesses and their significantly higher latencies

  • A low-level hardware monitoring facility in coordination with a comprehensive toolset has to be provided enabling users to perform the required optimizations. Such an environment has been presented in this work. It consists of a low-level hardware monitor capable of observing the complete inter-node memory access traffic across the interconnection network and a tool infrastructure transforming the gathered information about the runtime behavior of the application into a humanreadable way and enhancing it by additional information acquired through the various layers of the runtime environment

Read more

Summary

Motivation

The development of parallel programs which run efficiently on parallel machines is a difficult task and takes much more effort than the development of sequential codes. Tao et al / Memory access behavior analysis of NUMA-based shared memory programs chitecture, a software framework has been developed within the SMiLE project (Shared Memory in a LAN like Environment) which closes the semantic gap between the global view of the distributed physical memories in NUMA architectures and the global virtual memory abstraction required by shared memory programming models [8,17] This framework supports, in principle, almost arbitrary shared memory programming models on top of the PC cluster [13] and thereby creates a flexible target platform for the presented monitoring approach.

Shared memory in NUMA clusters
Observing shared memory accesses
Challenges
The SMiLE monitoring approach
The SMiLE tool infrastructure
Access behavior analysis
The visualization tool
Analyzing a sample code
Using the information for easy optimization
Related work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call