Abstract

The purpose of this study is to quantitatively assess the performance of graph processing algorithms for large scale-free graphs residing in byte-addressable Non-Volatile Memory (NVM). Our study focuses on static and dynamic graph algorithms previously optimized for external memory in the form of locally attached NAND Flash arrays, with data structures tuned to maximize locality. The evaluation is run on a unique resource, an NVM hardware emulator from Intel capable of inserting delays to memory reads through microcode instructions thatdelay load instructions missing in L3; the emulated NVM appears as separate UMA node (identified by a physical address range) and that is not part of the socket-attached NUMA nodes. In this work, we distinguish two graph processing configurations, 'semi-external' in which the graph is fully resident in NVM but in-flight intermediate data structures reside in DRAM, and 'fully external' in which both the graph and the intermediate data structures reside in NVM. Our goal is to assess the performance impact of NVM latency of up to 3.5X DRAM, with (semi-external) and without (fully external) an application-specific scratchpad for the in-flight data structures. We find a performance penalty of 59.6% in the fully external scenario, which is reduced to 5.2% with the scratchpad. Our results show that graph algorithms employing locality aware data structure layout and processing can benefit immediately from emerging NVMs with minimal performance impact, making NVM a high value resource for large scale graph processing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call