BL: An Efficient Index for Reachability Queries on Large Graphs

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

BL: An Efficient Index for Reachability Queries on Large Graphs

Similar Papers
  • Research Article
  • Cite Count Icon 86
  • 10.1007/s00778-011-0256-4
GRAIL: a scalable index for reachability queries in very large graphs
  • Sep 23, 2011
  • The VLDB Journal
  • Hilmi Yıldırım + 2 more

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/ .

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-94289-6_28
Min-Forest: Fast Reachability Indexing Approach for Large-Scale Graphs on Spark Platform
  • Jan 1, 2018
  • Liu Yang + 4 more

Reachability query is an important graph operation in graph database which answers whether a vertex can reach another vertex through a path over the graph, and it is also fundamental to real applications involved with graph-shaped data. However, the increasingly large amount of data in real graph database makes it more challenging for query efficiency and scalability. In this paper, we propose Min-Forest approach to handle with reachability problem in large graphs. We present Min-Forest structure to transfer and label the original DAG, and introduce a 4-tuple labeling scheme to construct index for each vertices, which integrate interval labels for trees and non-tree labels. We design efficient reachability query algorithms for Min-Forest approach on the Cloud Platform of Spark. The experiment results show that query time of Min-Forest approach is also on average about 10−4 ms for large dense graphs, and query time and index construction time of our approach are linear for both sparse graphs and dense graphs. It can answer reachability queries much faster than the state-of-art approaches on real graphs database, especially on large and dense ones.

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.is.2013.10.003
Efficient processing of label-constraint reachability queries in large graphs
  • Oct 18, 2013
  • Information Systems
  • Lei Zou + 5 more

Efficient processing of label-constraint reachability queries in large graphs

  • Conference Article
  • Cite Count Icon 19
  • 10.1145/2063576.2063807
Answering label-constraint reachability in large graphs
  • Oct 24, 2011
  • Kun Xu + 5 more

In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries, specifically,given a label set S and two vertices u1 and u2 in a large directed graph G, we verify whether there exists a path from u1 to u2 under label constraint S. Like traditional reachability queries, LCR queries are very useful, such as pathway finding in biological networks, inferring over RDF (resource description f ramework) graphs, relationship finding in social networks. However, LCR queries are much more complicated than their traditional counterpart.Several techniques are proposed in this paper to minimize the search space in computing path-label transitive closure. Furthermore, we demonstrate the superiority of our method by extensive experiments.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-60259-8_11
Fruited-Forest: A Reachability Querying Method Based on Spanning Tree Modelling of Reduced DAG
  • Jan 1, 2020
  • Liu Yang + 5 more

A reachability query is a fundamental graph operation in real graph applications, which answers whether a node can reach another node through a path in a graph. However, the increasingly large amounts of real graph data make it more challenging for query efficiency and scalability. In this paper, we propose a Fruited-Forest (FF) approach to accelerate reachability queries in large graphs by constructing four kinds of fruited-forests from a reduced DAG in different traversal orders. We build different binary-label schemes for the four kinds of fruited-forests to cover reachability between nodes as much as possible, and create a corresponding index for the deleted edges which are deleted during the construction of fruited-forests. Our experimental results on 18 large real graph datasets show that our FF approach requires less index construct time and a smaller index size, which is more scalable to answer reachability queries compared with other existing works.

  • Conference Article
  • Cite Count Icon 43
  • 10.1109/icde.2012.129
Horton: Online Query Execution Engine for Large Distributed Graphs
  • Apr 1, 2012
  • Mohamed Sarwat + 3 more

Graphs are used in many large-scale applications, such as social networking. The management of these graphs poses new challenges as such graphs are too large for a single server to manage efficiently. Current distributed techniques such as map-reduce and Pregel are not well-suited to processing interactive ad-hoc queries against large graphs. In this paper we demonstrate Horton, a distributed interactive query execution engine for large graphs. Horton defines a query language that allows the expression of regular language reach ability queries and provides a query execution engine with a query optimizer that allows interactive execution of queries on large distributed graphs in parallel. In the demo, we show the functionality of Horton managing a large graph for a social networking application called Codebook, whose graph represents data on software components, developers, development artifacts such as bug reports, and their interactions in large software projects.

  • Research Article
  • Cite Count Icon 31
  • 10.1007/s00778-011-0238-6
Answering pattern match queries in large graph databases via graph embedding
  • Jun 7, 2011
  • The VLDB Journal
  • Lei Zou + 3 more

The growing popularity of graph databases has generated interesting data management problems, such as subgraph search, shortest path query, reachability verification, and pattern matching. Among these, a pattern match query is more flexible compared with a subgraph search and more informative compared with a shortest path or a reachability query. In this paper, we address distance-based pattern match queries over a large data graph G. Due to the huge search space, we adopt a filter-and-refine framework to answer a pattern match query over a large graph. We first find a set of candidate matches by a graph embedding technique and then evaluate these to find the exact matches. Extensive experiments confirm the superiority of our method.

  • Conference Article
  • Cite Count Icon 68
  • 10.1145/2213836.2213888
Efficient processing of distance queries in large graphs
  • May 20, 2012
  • James Cheng + 3 more

We propose a novel disk-based index for processing single-source shortest path or distance queries. The index is useful in a wide range of important applications (e.g., network analysis, routing planning, etc.). Our index is a tree-structured index constructed based on the concept of vertex cover. We propose an I/O-efficient algorithm to construct the index when the input graph is too large to fit in main memory. We give detailed analysis of I/O and CPU complexity for both index construction and query processing, and verify the efficiency of our index for query processing in massive real-world graphs.

  • Book Chapter
  • Cite Count Icon 75
  • 10.1007/978-1-4419-6045-0_6
Graph Reachability Queries: A Survey
  • Jan 1, 2010
  • Jeffrey Xu Yu + 1 more

There are numerous applications that need to deal with a large graph, including bioinformatics, social science, link analysis, citation analysis, and collaborative networks. A fundamental query is to query whether a node is reachable from another node in a large graph, which is called a reachability query. In this survey, we discuss several existing approaches to process reachability queries. In addition, we will discuss how to answer reachability queries with the shortest distance, and graph pattern matching over a large graph.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-25159-2_21
Interval-Index: A Scalable and Fast Approach for Reachability Queries in Large Graphs
  • Jan 1, 2015
  • Fangxu Li + 2 more

Now more and more large graphs are available. One interesting problem is how to effectively find reachability between any vertex pairs in a very large graph. Multiple approaches have been proposed to answer reachability queries. However, most approaches only perform well on small graphs. Processing reachability queries on large graphs requires much storage and computation and still remains challenges. In this paper, we propose a scalable and fast indexing approach called Interval-Index, based on traversal tree-based partitioning and relabeling scheme. Our approach has several unique features: first, the traversal tree-based partitioning ensures access locality and parallelism in computation; second, continuous relabeling ensures fast querying and saves search space; third, we convert the entire graph database into a traversal tree graph on a smaller scale, to reach a compact storage structure. Finally, we run extensive experiments on synthetic graphs and real graphs with different sizes, and show that Interval-Index approach outperforms the state-of-the-art Feline in both storage size and the performance of query execution.

  • Book Chapter
  • Cite Count Icon 26
  • 10.1007/978-3-642-12026-8_13
NOVA: A Novel and Efficient Framework for Finding Subgraph Isomorphism Mappings in Large Graphs
  • Jan 1, 2010
  • Ke Zhu + 4 more

Considerable efforts have been spent in studying subgraph problem. Traditional subgraph containment query is to retrieve all database graphs which contain the query graph g. A variation to that is to find all occurrences of a particular pattern(the query) in a large database graph. We call it subgraph matching problem. The state of art solution to this problem is GADDI. In this paper, we will propose a more efficient index and algorithm to answer subgraph matching problem. The index is based on the label distribution of neighbourhood vertices and it is structured as a multi-dimensional vector signature. A novel algorithm is also proposed to further speed up the isomorphic enumeration process. This algorithm attempts to maximize the computational sharing. It also attempts to predict some enumeration state is impossible to lead to a final answer by eagerly pruning strategy. We have performed extensive experiments to demonstrate the efficiency and the effectiveness of our technique.

  • Book Chapter
  • Cite Count Icon 14
  • 10.1007/978-3-642-00887-0_12
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs
  • Jan 1, 2009
  • Linhong Zhu + 4 more

Graph-structured databases and related problems such as reachability query processing have been increasingly relevant to many applications such as XML databases, biological databases, social network analysis and the Semantic Web. To efficiently evaluate reachability queries on large graph-structured databases, there has been a host of recent research on graph indexing. To date, reachability indexes are generally applied to the entire graph. This can often be suboptimal if the graph is large or/and its subgraphs are diverse in structure. In this paper, we propose a uniform framework to support existing reachability indexing for subgraphs of a given graph. This in turn supports fast reachability query processing in large graph-structured databases. The contributions of our uniform framework are as follows: (1) We formally define a graph framework that facilitates indexing subgraphs, as opposed to the entire graph. (2) We propose a heuristic algorithm to partition a given graph into subgraphs for indexing. (3) We demonstrate how reachability queries are evaluated in the graph framework. Our preliminary experimental results showed that the framework yields a smaller total index size and is more efficient in processing reachability queries on large graphs than a fixed index scheme on the entire graphs.

  • Research Article
  • Cite Count Icon 123
  • 10.14778/1920841.1920879
GRAIL
  • Sep 1, 2010
  • Proceedings of the VLDB Endowment
  • Hilmi Yildirim + 2 more

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability trade-off indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they simply do not scale to very large real-world graphs. We present a very simple, but scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling, and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/bigdata.2015.7363833
KeyLabel algorithms for keyword search in large graphs
  • Oct 1, 2015
  • Yue Wang + 3 more

Graph keyword search is the process of extracting small subgraphs that contain a set of query keywords from a graph. This problem is challenging because there are many constraints, including distance constraint, keyword constraint, search time constraint, index size constraint, and memory constraint, while the size of data is inflating at a very high speed nowadays. Existing greedy algorithms guarantee good performance by sacrificing the accuracy to generate approximate answers, and exact algorithms promise exact answers but require a high memory consumption for loading indices and advanced knowledge about the maximum distance constraint. For big data applications, existing techniques are inefficient and impractical due to huge memory consumption and varied distance constraint. We propose a new keyword search algorithm that finds exact answers with low memory consumption and without advanced knowledge of maximum distance constraint. This algorithm builds a compact index structure offline based on a recent labeling index for shortest path queries. At the query time, it finds the answer efficiently by examining a small portion of the index related to a query.

  • Conference Article
  • Cite Count Icon 157
  • 10.1145/1376616.1376677
Efficiently answering reachability queries on very large directed graphs
  • Jun 9, 2008
  • Ruoming Jin + 3 more

Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.