Abstract

The increasing prevalence of co-processors such as the Intel Xeon Phi, has been reshaping the high performance computing (HPC) landscape. The Xeon Phi comes with a large number of power efficient CPU cores, but at the same time, it's a highly memory constraint environment leaving the task of memory management entirely up to application developers. To reduce programming complexity, we are focusing on application transparent, operating system (OS) level hierarchical memory management.In particular, we first show that state of the art page replacement policies, such as approximations of the least recently used (LRU) policy, are not good candidates for massive many-cores due to their inherent cost of remote translation lookaside buffer (TLB) invalidations, which are inevitable for collecting page usage statistics. The price of concurrent remote TLB invalidations grows rapidly with the number of CPU cores in many-core systems and outpace the benefits of the page replacement algorithm itself. Building upon our previous proposal, per-core Partially Separated Page Tables (PSPT), in this paper we propose Core-Map Count based Priority (CMCP) page replacement policy, which exploits the auxiliary knowledge of the number of mapping CPU cores of each page and prioritizes them accordingly. In turn, it can avoid TLB invalidations for page usage statistic purposes altogether. Additionally, we describe and provide an implementation of the experimental 64kB page support of the Intel Xeon Phi and reveal some intriguing insights regarding its performance. We evaluate our proposal on various applications and find that CMCP can outperform state of the art page replacement policies by up to 38%. We also show that the choice of appropriate page size depends primarily on the degree of memory constraint in the system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call