Abstract

Answering queries using views has proven effective for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on graph simulation. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern views. We show that a pattern query can be answered using a set of views if and only if it is contained in the views. Based on this characterization, we develop efficient algorithms to answer graph pattern queries. We also study problems for determining (minimal, minimum) containment of pattern queries. We establish their complexity (from cubic-time to NP-complete) and provide efficient checking algorithms (approximation when the problem is intractable). In addition, when a pattern query is not contained in the views, we study maximally contained rewriting to find approximate answers; we show that it is in cubic-time to compute such rewriting, and present a rewriting algorithm. We experimentally verify that these methods are able to efficiently answer pattern queries on large real-world graphs.

Highlights

  • Answering queries using views has been extensively studied for relational data [27], [33], XML [30], [50], [51] and semistructured data [11], [43], [52]

  • We focus on graph pattern matching defined in terms of graph simulation [28], since it is commonly used in social community detection [10], biological analysis [35], and mobile network analyses [24]

  • (1) To characterize when graph pattern queries can be answered using views based on graph simulation, we propose a notion of pattern containment (Section 3)

Read more

Summary

INTRODUCTION

Answering queries using views has been extensively studied for relational data [27], [33], XML [30], [50], [51] and semistructured data [11], [43], [52]. This work extends [21] by including new proofs, results and experimental study: (1) proofs for the pattern containment characterization (Section 3); (2) proofs of the fundamental problems for pattern containment (Section 4); (3) algorithms contain and minimum (Section 5); (4) results and proofs for maximally contained rewriting for graph pattern matching (Section 6), a topic not studied in [21]; and (5) two sets of new experiments (Section 7): one for evaluating the effectiveness of our approach using graphs with billions of nodes and edges [3], and the other for the efficiency and accuracy of approximate query answering by means of maximally contained rewriting. Maximally contained views can be combined with access constraints [12] to compute exact query answers following [17] Taken together, these methods yield a promising approach to querying “big” graphs. We state the problem of pattern matching using views

Data Graphs and Graph Pattern Queries
Graph Pattern Matching Using Views
A CHARACTERIZATION
PATTERN CONTAINMENT PROBLEMS
DETERMINING PATTERN CONTAINMENT
Minimal Containment Problem
Minimum Containment Problem
MAXIMALLY CONTAINED REWRITING
EXPERIMENTAL EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call