PigSPARQL

Alexander Schätzle,Martin Przyjaciel-Zablocki,Georg Lausen

doi:10.1145/1999299.1999303

Abstract

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.

Full Text