Streaming graph processing performs batched updates and analytics on a time-evolving graph. The underlying representation format of the graph largely determines the throughputs of these updates and analytics phases. Existing representation formats usually employ variations of hash tables or adjacency lists. However, a recent study showed that the adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash table-based approaches suffer on short-tailed graphs. We propose GraphTango, a hybrid representation format that provides excellent update and analytics throughput regardless of the graph’s degree distribution. GraphTango dynamically switches among three different formats based on a vertex’s degree: (i) Low-degree vertices store the edges directly with the neighborhood metadata, confining accesses to a single cache line, (2) Medium-degree vertices use adjacency lists, and (3) High-degree vertices use hash tables as well as adjacency lists. In this case, the adjacency list provides fast traversal during the analytics phase, while the hash table provides constant-time lookups during the update phase. We further optimized the performance by designing an open-addressing-based hash table that fully utilizes every fetched cache line. In addition, we developed a thread-local lock-free memory pool that allows fast growing/shrinking of the adjacency lists and hash tables in a multi-threaded environment. We evaluated GraphTango with the help of the SAGA-Bench framework and compared it with four other representation formats: Stinger, Degree-aware Robin Hood Hashing, and two adjacency list-based formats with different workload balancing scheme. On average, GraphTango provides 4.5x higher insertion throughput, 3.2x higher deletion throughput, and 1.1x higher analytics throughput over the next best format. Furthermore, we integrated GraphTango with the state-of-the-art graph processing frameworks DZiG and RisGraph. Compared to the vanilla DZiG and vanilla RisGraph, [GraphTango + DZiG] and [GraphTango + RisGraph] reduces the average batch processing time by 2.3x and 1.5x, respectively.
Read full abstract