Abstract

The multi-core architectures are nowadays characterized by Non-Uniform Memory Access (NUMA). Efficiently exploiting such architectures is extremely complicated for programmers. Multi-threaded programs may encounter high memory access latency if the mapping of data and computing is not considered carefully on such systems. Programmers need tools to detect performance problems if high memory access latency occurs. To address this need, we present a profiling tool called LaProf, which uses memory access latency information to detect performance problems on NUMA systems. This tool can be used to detect three performance problems of multi-threaded programs, which are: 1) data sharing. Shared data will cause remote memory access if threads which access the shared data are not allocated on the same node of NUMA systems, 2) shared resource contention. High memory access latency will influence the performance severely if contention happens on shared resources, such as last-level caches, inter-connect links and memory controllers, 3) remote access imbalance. The thread which has the most number of remote data access becomes the critical thread which lags down the overall performance of multi-threaded program. After the detection done by LaProf, using simple and general NUMA optimization techniques, the performance improvement for each problem is 88%, 32%, 99% respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.