Abstract

Data generation and publication on the Web has increased over the last years. This phenomenon, usually known as “Big Data”, poses new challenges related with Volume, Velocity, and Variety (“The three V's”) of data. The Semantic Web offers the means to deal with variety, where RDF (Resource Description Framework) is used to model data in the form of triples subject-predicate-object. In this way, it is possible to represent and interconnect RDF triples to build a true Web of Data. Nonetheless, a problem arises when big RDF collections must be stored, exchanges, and/or queried because the existing serialization formats are highly verbose, hence the remaining Big Semantic Data challenges (volume and variety) are aggravated when storing, exchanging, or querying big RDG collections. HDT addresses this issue by proposing a binary serialization format based on compact data structures that allows RDF to be compressed, but also to be queried without prior decompression. Thus, HDT reduces data volume and increases retrieval velocity. However, this achievement comes at the cost of and expensive RDF-to-HDT serialization in terms of computational resources and time. Therefore, HDT alleviates velocity and volume challenges for the end user, but moves Big Data challenges to the data publisher. In this work we show HDT-MR, a MapReduce-based algorithm that allows RDF datasets to be serialized to HDT in a distributed way, reducing processing resources and time, but also enabling larger datasets to be compressed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.