Abstract

We define a procedure for canonicalising SPARQL 1.1 queries. Specifically, given two input queries that return the same solutions modulo variable names over any RDF graph (which we call congruent queries), the canonicalisation procedure aims to rewrite both input queries to a syntactically canonical query that likewise returns the same results modulo variable renaming. The use-cases for such canonicalisation include caching, optimisation, redundancy elimination, question answering, and more besides. To begin, we formally define the semantics of the SPARQL 1.1 language, including features often overlooked in the literature. We then propose a canonicalisation procedure based on mapping a SPARQL query to an RDF graph, applying algebraic rewritings, removing redundancy, and then using canonical labelling techniques to produce a canonical form. Unfortunately a full canonicalisation procedure for SPARQL 1.1 queries would be undecidable. We rather propose a procedure that we prove to be sound and complete for a decidable fragment of monotone queries under both set and bag semantics, and that is sound but incomplete in the case of the full SPARQL 1.1 query language. Although the worst case of the procedure is super-exponential, our experiments show that it is efficient for real-world queries, and that such difficult cases are rare.

Highlights

  • R The Semantic Web provides a variety of standards and techniques for enhancing the machine-readability of Web content in order to increase the levels of automation possible for day-to-day tasks

  • T but incomplete, canonicalisation of the full SPARQL 1.1 query language, whereby the canonicalised query will be congruent to the input query, but not all pairs of congruent input queries will result in the same output query

  • If bag R semantics is selected, the unions of conjunctive queries (UCQs) can only contain a syntactic form of redundancy: exact duplicate triple patterns in the same basic graph patterns (BGPs), which are implicitly removed since we model BGPs as sets of triple patterns

Read more

Summary

Introduction

R The Semantic Web provides a variety of standards and techniques for enhancing the machine-readability of Web content in order to increase the levels of automation possible for day-to-day tasks. O work for the graph-based representation of data on the Semantic Web. In turn, SPARQL [24] is the standard querying C language for RDF, composed of basic graph patterns extended with expressive features that include path expressions, relational algebra, aggregation, federation, among others. The adoption of RDF as a data model and SPARQL as a query language has grown significantly in recent years [4,26]. Prominent datasets such as DBpedia [35] and Wikidata [61] contain in the order of hundreds of millions or even billions of RDF triples, and their associated SPARQL endpoints receive millions of queries per day [37,52]. The same study identified the complexity of SPARQL queries as one of the main causes

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call