Abstract
Efficient RDF data management systems are central to the vision of the Semantic Web. The enormous increase in both user and machine generated content dictates for scalable solutions in triple data stores. Current systems manage to decentralize some or all the stages of RDF data management, scaling to arbitrarily large numbers of triples. Yet, these systems prove highly inflexible in adjusting their behavior relative to the query in hand. Queries over triple data include multiple joins with varying degrees of selectivity and cost. In many cases, a join performed on a single centralized computer node is highly preferable. Thus, both informed query planning and adaptive join execution are necessary to gain optimal performance in both selective and non selective queries. Towards that direction, we describe H2RDF+, an RDF store that efficiently performs distributed joins over a multiple index scheme. H2RDF+ materializes 6 RDF indexes and detailed statistics using HBase. In this work, we emphasize on our novel, scalable and efficient MapReduce indexing process that allows H2RDF+ to handle arbitrarily large RDF datasets. Aggressive byte-level compression is also extensively used to reduce the storage space requirements of the system. H2RDF+ can also adaptively process both complex and selective queries by adaptively choosing the amount of resources allocated for each join, based on join complexity estimated through index statistics.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have