Abstract

Set reachability query in directed graphs has a plethora of graph-based applications such as dependency analysis and graph centrality calculation. Given two sets <inline-formula><tex-math notation="LaTeX">$S$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> of source and target vertices, set reachability query needs to acquire all pairs <inline-formula><tex-math notation="LaTeX">$(s,t)$</tex-math></inline-formula> where <inline-formula><tex-math notation="LaTeX">$s{\in }S$</tex-math></inline-formula> , <inline-formula><tex-math notation="LaTeX">$t{\in }T$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$s$</tex-math></inline-formula> can reach <inline-formula><tex-math notation="LaTeX">$t$</tex-math></inline-formula> . The state-of-the-art approach distributed set reachability (DSR) investigates the set reachability query in a distributed environment and adopts a static graph-based index to enhance the query efficiency. Nevertheless, DSR needs to store the graph-based index in all partitions, which causes a huge space overhead. Furthermore, it cannot efficiently solve the negative query <inline-formula><tex-math notation="LaTeX">$(s,t)$</tex-math></inline-formula> where <inline-formula><tex-math notation="LaTeX">$s$</tex-math></inline-formula> cannot reach <inline-formula><tex-math notation="LaTeX">$t$</tex-math></inline-formula> , since DSR needs to traverse the whole reachable paths and becomes unable to efficiently reduce the computations. To alleviate these issues, we propose a novel multi-level 2-hop (ML2hop) index for the set reachability query in a distributed environment. Based on ML2hop, we further present a bi-directional query algorithm, called MLQA, to achieve efficient support for both positive and negative queries in Pregel-like systems. Generally, MLQA is equipped with the following three significant properties: (1) Low computation costs. It reduces redundant local computations in each partition by controlling the rounds of path traversals. (2) Low communication costs. It restricts the message exchange among different partitions within one single round with guaranteed accuracy of query results. (3) High parallelism. It adopts a bi-directional query technique for message propagation, achieving the better query efficiency than the forward-traversal query strategy utilized in DSR. Experimental results over several real-world graphs demonstrate that MLQA significantly outperforms the state-of-the-art algorithm by up to two orders of magnitude speedup.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call