HyPSo

Tanvi Chawla,Girdhari Singh,Emmanuel S Pilli

doi:10.1145/3297001.3297025

Abstract

The proliferation of RDF data has presented the need for distributed RDF storage solutions. RDF is increasingly being used to model data for the Semantic Web. This sudden increase in the amount of RDF data is a pressing issue that requires a scalable solution. Some of the common issues faced while handling Big RDF data are storage and query processing. In this paper, we introduce a hybrid RDF partitioning scheme to speed up SPARQL query processing for Big RDF data. The approach in this paper, HyPSo, combines the two popular RDF partitioning schemes i.e. vertical partitioning and hash partitioning. By doing so HyPSo intends to provide a scalable solution to storage and query processing for Big RDF data. Our objective is to speed up query processing by reducing the query execution time. This is possible owing to the partitioning done in HyPSo where only the requisite properties are read thus, reducing the I/O cost and the join cost is minimized as subject-predicate bound queries can be answered by a single table scan. The Big RDF data partitioned using HyPSo occupies less disk space compared to the vertical partitioning scheme. HyPSo is compared with two existing distributed RDF frameworks in terms of storage space and query execution time and, it can be concluded that HyPSo demonstrates significant improvement in performance.

Full Text