Towards Efficient Distributed Subgraph Enumeration Via Backtracking-Based Framework

Zhaokang Wang,Chunfeng Yuan,Rong Gu,Guowang Chen,Yihua Huang,Weiwei Hu

doi:10.1109/tpds.2021.3076246

Abstract

Finding or monitoring subgraph instances that are isomorphic to a given pattern graph in a data graph is a fundamental query operation in many graph analytic applications, such as network motif mining and fraud detection. Existing distributed methods are inefficient in communication. They have to shuffle partial matching results during the distributed multiway join. The partial matching results may be much larger than the data graph itself. To overcome the drawback, we develop the Batch-BENU framework for distributed subgraph enumeration on static data graphs. Batch-BENU executes a group of local search tasks in parallel. Each task enumerates subgraphs around a vertex in the data graph, guided by a backtracking-based execution plan. To handle large-scale data graphs that may exceed the memory capacity of a single machine, Batch-BENU stores the data graph in a distributed database. Each task queries adjacency sets of the data graph on demand, shuffling the data graph instead of partial matching results. To support incremental subgraph enumeration on dynamic data graphs, we propose the Streaming-BENU framework. Streaming-BENU turns the problem of enumerating incremental matching results into enumerating all matching results of incremental pattern graphs at each time step. We implement Batch-BENU and Streaming-BENU with the local database cache and the load balance optimization to improve their efficiency. Extensive experiments show that Batch-BENU and Streaming-BENU can scale to big graphs and complex pattern graphs. They outperform the state-of-the-art distributed methods by up to one and two orders of magnitude, respectively.

Full Text