Efficient querying of multidimensional RDF data with aggregates: Comparing NoSQL, RDF and relational data stores

Franck Ravat,Jiefu Song,Olivier Teste,Cassia Trojahn

doi:10.1016/j.ijinfomgt.2020.102089

Franck Ravat, Jiefu Song + Show 2 more

Open Access

https://doi.org/10.1016/j.ijinfomgt.2020.102089

Copy DOI

Abstract

This paper proposes an approach to tackle the problem of querying large volume of statistical RDF data. Our approach relies on pre-aggregation strategies to better manage the analysis of this kind of data. Specifically, we define a conceptual model to represent original RDF data with aggregates in a multidimensional structure. A set of translations rules for converting a well-known multidimensional RDF modelling vocabulary into the proposed conceptual model is then proposed. We implement the conceptual model in six different data stores: two RDF triple stores (Jena TDB and Virtuoso), one graph-oriented NoSQL database (Neo4j), one column-oriented data store (Cassandra), and two relational databases (MySQL and PostGreSQL). We compare the querying performance, with and without aggregates, in these data stores. Experimental results, on real-world datasets containing 81.92 million triplets, show that pre-aggregation allows for reducing query runtime in all data stores. Neo4j NoSQL and relational databases with aggregates outperform triple stores speeding up to 99% query runtime.

Full Text