Abstract

RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend to focus on the distributed setting, the main focus of the work is on providing a comprehensive survey of state-of-the-art storage, indexing and query processing techniques for efficiently evaluating SPARQL queries in a local setting (on one machine). To keep the survey self-contained, we also provide a short discussion on graph partitioning techniques used in the distributed setting. We conclude by discussing contemporary research challenges for further improving SPARQL query engines. An extended version also provides a survey of over one hundred SPARQL query engines and the techniques they use, along with twelve benchmarks and their features.

Highlights

  • The Resource Description Framework (RDF) is a graphbased data model where triples of the form (s, p, o) denote directed labeled edges s −→p o in a graph

  • 9 Systems and Benchmarks In Appendix A we present a comprehensive survey of 135 individual RDF stores and SPARQL query engines – both distributed and local – in terms of the techniques discussed that they use

  • While RDF stores and SPARQL engines have traditionally relied on relational databases and relational-style optimizations to ensure scalability and efficiency, we see a growing trend towards (1) native graph-based storage, indexing and query processing techniques, along with (2) exploiting modern hardware and data management/processing

Read more

Summary

Introduction

The Resource Description Framework (RDF) is a graphbased data model where triples of the form (s, p, o) denote directed labeled edges s −→p o in a graph. RDF has gained significant adoption in the past years, on the Web. As of 2019, over 5 million websites publish RDF data embedded in their webpages [34]. RDF has become a popular format for publishing knowledge graphs on the Web, the largest of which – including Bio2RDF, DBpedia, PubChemRDF, UniProt, and Wikidata – contain billions of triples. These developments have brought about the need for optimized techniques and engines for querying large RDF graphs. We refer to engines that allow for storing, indexing and processing joins over RDF as RDF stores

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call