Abstract

The launch of DIMM type 3D XPoint is planned in 2018, and machines that have such devices as large main memory will be commodity in the near future. It is important to evaluate application performance beforehand on those machine configurations, considering the effects of larger main memory latency. The objective of this paper is to propose an accurate and high-throughput evaluation methodology for exhaustive experiments to evaluate with lots of applications with various multidimensional conditions. Also the target architecture is manycore processors such as Xeon Phi KNL and assumes they have large DRAM cache in addition to 3D XPoint main memory. In order to evaluate the latency effects accurately, it is necessary to take stall cycles caused by main memory accesses into account. However, using cycle accurate simulators is too heavy. Instead, we adopt to harness performance counters of processors. However, the current Xeon Phi KNL does not have any performance counters for the stalls. To address this issue, our method integrates measurement results on Xeon Skylake-SP, which have desirable performance counters and close memory system to that of KNL. The paper shows results of exhaustive experiments, which take two days with the proposed method considering arbitrary latency settings. With a cycle accurate simulator, the equivalent experiments would take about 180 years per latency setting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.