Scale-Free Graph Processing on a NUMA Machine

Tanuj Kr Aasawat ,Tahsin Reza ,Matei Ripeanu

doi:10.14288/1.0372898

Abstract

Modern shared-memory systems embrace the NUMA architecture which has proven to be more scalable than the SMP architecture. In many ways, a NUMA system resembles a shared-nothing distributed system: physically distinct processing units and memory regions. Memory accesses to remote NUMA domains are more expensive than local accesses. This poses the opportunity to transfer the know-how and design of distributed graph processing to develop shared-memory graph processing solutions optimized for NUMA systems. To this end, we explore if a distributed-memory like middleware that makes graph partitioning and communication between partitions explicit, can improve the performance on a NUMA system. We design and implement a NUMA aware graph processing framework that embraces design philosophies of distributed graph processing system: in particular explicit partitioning and inter-partition communication, and at the same time exploits optimization opportunities specific to single-node systems. We demonstrate up to 13.9x speedup over a state-of-the-art NUMA-aware framework, Polymer and up to 3.7x scalability on a four-socket NUMA machine using graphs with tens of billions of edges.

Full Text