Fast Processing SPARQL Queries on Large RDF Data

Guang Yang,Pingpeng Yuan,Hai Jin

doi:10.1109/dasc-picom-datacom-cyberscitec.2016.166

Guang Yang, Pingpeng Yuan + Show 1 more

https://doi.org/10.1109/dasc-picom-datacom-cyberscitec.2016.166

Copy DOI

Abstract

The RDF (Resource Description Framework) datamodel has been used in various domains, such as Web,government, biology etc. Now, the volume of RDF datasets is growing significantly. The explosion on the volume of RDF data raises serious challenges: how to answer SPARQL queries on large RDF data sets efficiently. Here, we present a large-scale RDF data system - TripleParallel, which implements blockbased parallel processing SPARQL queries on RDF data sets with billion triples. The system improves parallelism while strengthening the overlapping data and calculations and reduces the overall execution time of the query. TripleParallel also implements multiple parallel operations for parallel processing joins. Experimental studies with several RDF datasets, including the LUBM and the UniProt collection, demonstrate the performance gains of our approach, outperforming the previous fastest system by more than an order of magnitude.

Full Text