Abstract

With the popularity of knowledge graphs growing rapidly, large amounts of RDF graphs have been released, which raises the need for addressing the challenge of distributed subgraph matching queries. In this paper, we propose an efficient distributed method to answer subgraph matching queries on big RDF graphs using MapReduce. In our method, query graphs are decomposed into a set of stars that utilize the semantic and structural information embedded RDF graphs as heuristics. Two optimization techniques are proposed to further improve the efficiency of our algorithms. One algorithm, called RDF property filtering, filters out invalid input data to reduce intermediate results; the other is to improve the query performance by postponing the Cartesian product operations. The extensive experiments on both synthetic and real-world datasets show that our method outperforms the close competitors S2X and SHARD by an order of magnitude on average.

Highlights

  • IntroductionMore than one decade ago, the Semantic Web was proposed by Berners-Lee et al [3], which has become a series of W3C standards in order to realize the machine

  • More than one decade ago, the Semantic Web was proposed by Berners-Lee et al [3], which has become a series of W3C standards1 in order to realize the machineThe Resource Description Framework, a graph-based data model, is commonly used to represent and organize resources in knowledge graphs because of its flexibility

  • Our main contributions include: (1) we propose an efficient and scalable distributed algorithm based on star decomposition, called StarMR, for answering subgraph matching queries on Resource Description Framework (RDF) graphs; (2) two optimization strategies of StarMR are devised, one of which employing the properties in RDF graphs to filter out invalid input data in MapReduce iterations, the other postponing part of Cartesian product operations to the final step of MapReduce to reduce a part of unpromising Cartesian product operations; and (3) extensive experiments on both synthetic and real-world RDF graphs have been conducted to verify the efficiency and scalability of our method

Read more

Summary

Introduction

More than one decade ago, the Semantic Web was proposed by Berners-Lee et al [3], which has become a series of W3C standards in order to realize the machine. Our main contributions include: (1) we propose an efficient and scalable distributed algorithm based on star decomposition, called StarMR, for answering subgraph matching queries on RDF graphs; (2) two optimization strategies of StarMR are devised, one of which employing the properties in RDF graphs to filter out invalid input data in MapReduce iterations, the other postponing part of Cartesian product operations to the final step of MapReduce to reduce a part of unpromising Cartesian product operations; and (3) extensive experiments on both synthetic and real-world RDF graphs have been conducted to verify the efficiency and scalability of our method.

Related Work
Preliminaries
The StarMR Algorithm
Storage Schema
Star Matching
Star Decomposition of Query Graphs
Subgraph Matching Algorithm Using MapReduce
Two Optimization Strategies
RDF Property Filtering
Postponing Cartesian Product Operations
Settings
Experiments on WatDiv Datasets
Experiments
Efficiency on WatDiv
Scalability on WatDiv
10-1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
Experiments on LUBM
Efficiency on LUBM
Scalability on LUBM
Experiments on the Real‐World Dataset
Efficiency on DBpedia
Scalability on DBpedia
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.