An in-depth study of continuous subgraph matching

Xibo Sun,Bingsheng He,Qiong Luo,Shixuan Sun

doi:10.14778/3523210.3523218

Abstract

Continuous subgraph matching (CSM) algorithms find the occurrences of a given pattern on a stream of data graphs online. A number of incremental CSM algorithms have been proposed. However, a systematical study on these algorithms is missing to identify their advantages and disadvantages on a wide range of workloads. Therefore, we first propose to model CSM as incremental view maintenance (IVM) to capture the design space of existing algorithms. Then, we implement six representative CSM algorithms, including InclsoMatch, SJ-Tree, Graphflow, IEDyn, TurboFlux, and SymBi, in a common framework based on IVM. We further conduct extensive experiments to evaluate the overall performance of competing algorithms as well as study the effectiveness of individual techniques to pinpoint the key factors leading to the performance differences. We obtain the following new insights into the performance: (1) existing algorithms start the search from an edge in the query graph that maps to an updated data edge, potentially leading to many invalid partial results; (2) all matching orders are based on simple heuristics, which appear ineffective at times; (3) index updates dominate the query time on some queries; and (4) the algorithm with constant delay enumeration bears significant index update cost. Consequently, no algorithm dominate the others in all cases. Therefore, we give a few recommendations based on our experiment results. In particular, the SymBi index is useful for sparse queries or long running queries. The matching orders of IEDyn and TurboFlux work well on tree queries, those of Graphflow on dense queries or when both query and data graphs are sparse, and otherwise, we recommend SymBi's matching orders.

Full Text