Abstract

Graphs are used to store relationships on a variety of topics, such as road map data and social media connections. Processing this data allows one to uncover insights from its structure. However, while analyzing graphs with traditional processors, the graph connectivity can result in irregular memory access patterns and thus poor data locality that can result in low performance. Processing-in-Memory (PIM) is an attractive alternative for graph processing, as it can reduce data movement by bringing the computation closer to the data itself. While PIM-based techniques have been shown to improve graph processing performance, there is still room for improvement, as critical bottlenecks exist when connecting multiple PIM-based accelerators into larger clusters. Although a number of recent proposals have aimed to reduce inter-accelerator data movement, their techniques have generally overlooked the potential to optimize how the graph’s connectivity can lead to a more efficient hardware mapping. In fact, many real-world graphs have a small percentage of high-degree nodes that connect widely to a large number of other nodes. By clustering these nodes into communities, one can more efficiently map them to hardware, minimizing expensive inter-accelerator communication, a key performance bottleneck in these accelerators. To capitalize on this observation, we propose PIM-GraphSCC, the first PIM-based graph processor that exploits a graph’s connectivity to significantly reduce communication over critical resources: the inter-accelerator links. By partitioning graphs into communities, PIM-GraphSCC provides a community-aware graph partitioning scheme that reduces inter-accelerator data movement by up to 93 percent compared to modern graph processing schemes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call