Scalable Indexing and Adaptive Querying of RDF Data in the cloud

Nikolaos Papailiou,Ioannis Konstantinou,Panagiotis Karras,Nectarios Koziris,Dimitrios Tsoumakos

doi:10.1145/2630602.2630603

Nikolaos Papailiou, Ioannis Konstantinou + Show 3 more

Open Access

PDF Available

https://doi.org/10.1145/2630602.2630603

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Efficient RDF data management systems are central to the vision of the Semantic Web. The enormous increase in both user and machine generated content dictates for scalable solutions in triple data stores. Current systems manage to decentralize some or all the stages of RDF data management, scaling to arbitrarily large numbers of triples. Yet, these systems prove highly inflexible in adjusting their behavior relative to the query in hand. Queries over triple data include multiple joins with varying degrees of selectivity and cost. In many cases, a join performed on a single centralized computer node is highly preferable. Thus, both informed query planning and adaptive join execution are necessary to gain optimal performance in both selective and non selective queries. Towards that direction, we describe H2RDF+, an RDF store that efficiently performs distributed joins over a multiple index scheme. H2RDF+ materializes 6 RDF indexes and detailed statistics using HBase. In this work, we emphasize on our novel, scalable and efficient MapReduce indexing process that allows H2RDF+ to handle arbitrarily large RDF datasets. Aggressive byte-level compression is also extensively used to reduce the storage space requirements of the system. H2RDF+ can also adaptively process both complex and selective queries by adaptively choosing the amount of resources allocated for each join, based on join complexity estimated through index statistics.

Full Text