Abstract

A key property of Linked Data is the representation and publication of data as interconnected labelled graphs where different resources linked to each other form a network of meaningful information. Searching these important relationships between resources – within single or distributed graphs – can be reduced to a pathfinding or navigation problem, i.e., looking for chains of intermediate nodes. SPARQL1.1, the current standard query language for RDF-based Linked Data defines a construct – called Property Paths (PPs) – to navigate between entities within a single graph. Since Linked Data technologies are naturally aimed at decentralised scenarios, there are many cases where centralising this data is not feasible or even not possible for querying purposes. To address these problems, we propose a SPARQL PP-based graph processing approach – dubbed DpcLD – where users can execute SPARQL PP queries and find paths distributed across multiple, connected graphs exposed as SPARQL endpoints. To execute the distributed path queries we implemented an index-free, cache-based query engine that communicates with a shared algorithm running on each remote endpoint, and computes the distributed paths. In this paper, we highlight the way in which this approach exploits and aggregates partial paths, within a distributed environment, to produce complete results. We perform extensive experiments to demonstrate the performance of our approach on two datasets: One representing 10 million triples from the DBPedia SPARQL benchmark, and another full benchmark dataset corresponding to 124 million triples. We also perform a scalability test of our approach using real-world genomics datasets distributed across multiple endpoints. We compare our distributed approach with other distributed and centralized pathfinding approaches, showing that it outperforms other distributed approaches by orders of magnitude, and provides a good trade-off for cases when the data cannot be centralised.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call