Planting trees for scalable and efficient canonical hub labeling

Kartik Lakhotia,Rajgopal Kannan,Qing Dong,Viktor Prasanna

doi:10.14778/3372716.3372722

Abstract

Hub labeling is widely used to improve the latency and throughput of Point-to-Point Shortest Distance (PPSD) queries in graph databases. However, constructing hub labeling, even via the state-of-the-art Pruned Landmark Labeling (PLL) algorithm is computationally intensive. PLL further has a sequential root order label dependency that makes it challenging to parallelize. Hence, the existing parallel approaches are often plagued by label size increase, poor scalability and inability to process large weighted graphs. In this paper, we develop novel algorithms that construct the minimal (guaranteed) Canonical Hub Labeling on shared and distributed-memory parallel systems in a scalable and efficient manner. Our key contribution, the PLaNT algorithm, provides an embarrassingly parallel approach for label construction that scales well beyond the limits of current practice. Our approach is the first to employ a collaborative label partitioning scheme across multiple nodes of a cluster, for completely in-memory labeling and parallel querying on massive graphs whose labels cannot fit on a single node. On a single node with 72-threads, our shared-memory algorithm is up to 47.4X faster than sequential PLL. While our labeling time is comparable to the state-of-the-art shared-memory paraPLL, our label size is 17% smaller on average. PLaNT demonstrates superior parallel scalability. It can process significantly larger graphs and construct labeling orders of magnitude faster than the state-of-the-art distributed paraPLL. Compared to the best shared-memory parallel algorithm, it achieves up to 9.5X speedup on a 64 node cluster.

Full Text