In this paper, we study the problem of label-constrained reachability (LCR) query which is fundamental in many applications with directed edge-label graphs. Although the classical reachability query (i.e., reachability query without label constraint) has been extensively studied, LCR query is much more challenging because the number of possible label constraint set is exponential to the size of the labels. We observe that the existing techniques for LCR queries only construct partial index for better scalability, and their worst query time is not guaranteed and could be the same as an online breadth-first search (BFS). In this paper, we propose novel label-constrained 2-hop indexing techniques with novel pruning rules and order strategies. It is shown that our worst query time could be bounded by the in-out index entry size. With all these techniques, comprehensive experiments show that our proposed methods significantly outperform the state-of-the-art technique in terms of query response time (up to 5 orders of magnitude speedup), index size and index construction time. In particular, our proposed method can answer LCR queries within microsecond over billion-scale graphs in a single machine.
Read full abstract