Abstract

Efficient SPARQL query evaluation is a significant challenge when the database contains billions of RDF triples, which is very common for many existing Web-scale RDF data sources. We address this challenge by 1) effectively partitioning the whole RDF dataset into small partitions according to the schemas of the RDF subjects, and 2) elaborately placing the partitions within clusters so that, on each local partition, we can make the most advantage of the state-of-the-art SPARQL query processing engine, and across the partitions, we can exploit the power of parallel databases for achieving scalable query evaluation of massive RDF data. This paper introduces the data partitioning and placement strategies, as well as the SPARQL query evaluation and optimization techniques in a cluster environment. Experiments are conducted over a synthesized dataset and a real dataset containing billions of triples. The results demonstrate that better query evaluation performance over the baseline can be achieved.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.