Abstract
Answering queries using views has proven effective for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on graph simulation. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern views. We show that a pattern query can be answered using a set of views if and only if it is contained in the views. Based on this characterization, we develop efficient algorithms to answer graph pattern queries. We also study problems for determining (minimal, minimum) containment of pattern queries. We establish their complexity (from cubic-time to NP-complete) and provide efficient checking algorithms (approximation when the problem is intractable). In addition, when a pattern query is not contained in the views, we study maximally contained rewriting to find approximate answers; we show that it is in cubic-time to compute such rewriting, and present a rewriting algorithm. We experimentally verify that these methods are able to efficiently answer pattern queries on large real-world graphs.
Highlights
Answering queries using views has been extensively studied for relational data [27], [33], XML [30], [50], [51] and semistructured data [11], [43], [52]
We focus on graph pattern matching defined in terms of graph simulation [28], since it is commonly used in social community detection [10], biological analysis [35], and mobile network analyses [24]
(1) To characterize when graph pattern queries can be answered using views based on graph simulation, we propose a notion of pattern containment (Section 3)
Summary
Answering queries using views has been extensively studied for relational data [27], [33], XML [30], [50], [51] and semistructured data [11], [43], [52]. This work extends [21] by including new proofs, results and experimental study: (1) proofs for the pattern containment characterization (Section 3); (2) proofs of the fundamental problems for pattern containment (Section 4); (3) algorithms contain and minimum (Section 5); (4) results and proofs for maximally contained rewriting for graph pattern matching (Section 6), a topic not studied in [21]; and (5) two sets of new experiments (Section 7): one for evaluating the effectiveness of our approach using graphs with billions of nodes and edges [3], and the other for the efficiency and accuracy of approximate query answering by means of maximally contained rewriting. Maximally contained views can be combined with access constraints [12] to compute exact query answers following [17] Taken together, these methods yield a promising approach to querying “big” graphs. We state the problem of pattern matching using views
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.