Abstract

Modern shared-memory systems embrace the NUMA architecture which has proven to be more scalable than the SMP architecture. In many ways, a NUMA system resembles a shared-nothing distributed system: physically distinct processing units and memory regions. Memory accesses to remote NUMA domains are more expensive than local accesses. This poses the opportunity to transfer the know-how and design of distributed graph processing to develop shared-memory graph processing solutions optimized for NUMA systems. To this end, we explore if a distributed-memory like middleware that makes graph partitioning and communication between partitions explicit, can improve the performance on a NUMA system. We design and implement a NUMA aware graph processing framework that embraces design philosophies of distributed graph processing system: in particular explicit partitioning and inter-partition communication, and at the same time exploits optimization opportunities specific to single-node systems. We demonstrate up to 13.9x speedup over a state-of-the-art NUMA-aware framework, Polymer and up to 3.7x scalability on a four-socket NUMA machine using graphs with tens of billions of edges.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call