Abstract

With RDF becoming the de facto standard for representing knowledge graphs, it is indispensable to develop scalable subgraph matching algorithms over big RDF graphs stored in distributed clusters. In this paper, we propose a novel distributed subgraph matching method SP-Tree, using the Pregel model, to answer subgraph matching queries on big RDF graphs. In our method, the query graph is transformed to a variant spanning tree based on the shortest paths. Two optimization techniques are proposed to improve the efficiency of our algorithms. One employs RDF shapes to filter out local computations and messages passed, the other postpones the Cartesian product operations in the matching process to reduce intermediate results. The extensive experiments on both synthetic and real-world datasets show that our SP-Tree subgraph matching method outperforms the state-of-the-art methods by an order of magnitude.

Highlights

  • The Resource Description Framework (RDF) is a W3C recommendation used for representing and organizing resources in knowledge graphs

  • Our main contributions include: (1) we propose an efficient and scalable distributed algorithm, based on the parallel graph computational model Pregel, for answering subgraph matching on big RDF graphs; (2) two optimization techniques of the basic algorithm are devised, one of which using RDF Shapes to prune local computations of vertices and messages passed among vertices in Pregel iterations, the other decomposing the SP-Tree into a set of SP-Paths to postpone part of Cartesian product operations; and (3) extensive experiments on both synthetic and real-world RDF graphs have been conducted to verify the efficiency and scalability of our method

  • The SP-Tree-based RDF subgraph matching algorithm is described in detail, which is developed by using Pregel

Read more

Summary

INTRODUCTION

The Resource Description Framework (RDF) is a W3C recommendation used for representing and organizing resources in knowledge graphs. In [11], a parallel subgraph listing framework is developed which relies on the graph traversal without query decomposition This method cannot be adapted to RDF graphs with vertex and edge labels. Our main contributions include: (1) we propose an efficient and scalable distributed algorithm, based on the parallel graph computational model Pregel, for answering subgraph matching on big RDF graphs; (2) two optimization techniques of the basic algorithm are devised, one of which using RDF Shapes to prune local computations of vertices and messages passed among vertices in Pregel iterations, the other decomposing the SP-Tree into a set of SP-Paths to postpone part of Cartesian product operations; and (3) extensive experiments on both synthetic and real-world RDF graphs have been conducted to verify the efficiency and scalability of our method.

RELATED WORK
QUERY GRAPH BASED METHODS
SP-TREE OF A QUERY GRAPH
SUBGRAPH MATCHING ALGORITHM USING PREGEL
OPTIMIZATION TECHNIQUES
POSTPONING CARTESIAN PRODUCT OPERATIONS
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.