Executing queries over schemaless RDF databases

Gunes Aluc,Khuzaima Daudjee,M Tamer Ozsu,Olaf Hartig

doi:10.1109/icde.2015.7113335

Abstract

Recent advances in Linked Data Management and the Semantic Web have led to a rapid increase in both the quantity as well as the variety of Web applications that rely on the SPARQL interface to query RDF data. Thus, RDF data management systems are increasingly exposed to workloads that are far more diverse and dynamic than what these systems were designed to handle. The problem is that existing systems rely on a workload-oblivious physical representation that has a fixed schema, which is not suitable for diverse and dynamic workloads. To address these issues, we propose a physical representation that is schemaless. The resulting flexibility enables an RDF dataset to be clustered based purely on the workload, which is key to achieving good performance through optimized I/O and cache utilization. Consequently, given a workload, we develop techniques to compute a good clustering of the database. We also design a new query evaluation model, namely, schemaless-evaluation that leverages this workload-aware clustering of the database whereby, with high probability, each tuple in the result set of a query is expected to be contained in at most one cluster. Our query evaluation model exploits this property to achieve better performance while ensuring fast generation of query plans without being hindered by the lack of a fixed physical schema.

Full Text