Abstract

The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61% performance improvement compared with the state-of-the-art locality-based method.

Highlights

  • Task mapping is an important step in achieving scalable performance on modern multicore processors

  • We present a task mapping method, called decongested locality (DeLoc), that considers both the spatial and temporal communication behaviors of a parallel application to improve the locality and to reduce the memory congestion on modern NUMA systems

  • WORK In this paper, we proposed a task mapping method, DeLoc, to address both the locality and memory congestion problems

Read more

Summary

INTRODUCTION

Task mapping is an important step in achieving scalable performance on modern multicore processors. We present a task mapping method, called decongested locality (DeLoc), that considers both the spatial and temporal communication behaviors of a parallel application to improve the locality and to reduce the memory congestion on modern NUMA systems. A locality-based task mapping method will gain higher performance improvements when considering parallel applications that have higher values of CommLoc. In a parallel application that has a low or zero communication-to-memory ratio, a task mapping method can still affect the memory access behavior. In a parallel application that has a low or zero communication-to-memory ratio, a task mapping method can still affect the memory access behavior In this case, distributing the non-communicating memory accesses can reduce the memory congestion because it will improve the balance of memory accesses among the NUMA nodes. We discuss the impacts of our method on both the communication locality and the memory congestion in Sections IV-A-3 and IV-B

PERFORMANCE EVALUATION ON A REAL SYSTEM
RELATED WORK
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.