Abstract

The dynamicity of RDF data has motivated the development of solutions for archiving, i.e., the task of storing and querying previous versions of an RDF dataset. Querying the history of a dataset finds applications in data maintenance and analytics. Notwithstanding the value of RDF archiving, the state of the art in this field is under-developed: (i) most existing systems are neither scalable nor easy to use, (ii) there is no standard way to query RDF archives, and (iii) solutions do not exploit the evolution patterns of real RDF data. On these grounds, this paper surveys the existing works in RDF archiving in order to characterize the gap between the state of the art and a fully-fledged solution. It also provides RDFev, a framework to study the dynamicity of RDF data. We use RDFev to study the evolution of YAGO, DBpedia, and Wikidata, three dynamic and prominent datasets on the Semantic Web. These insights set the ground for the sketch of a fully-fledged archiving solution for RDF data.

Highlights

  • The amount of RDF data has steadily grown since the conception of the Semantic Web in 2001 [13], as more and more organizations opt for RDF [66] as the format to publish and manage semantic data [39,41]

  • We have conducted a study of the evolution of three large RDF knowledge bases using our proposed framework RDFev, which resorts to a domain-agnostic analysis from two perspectives: At the low-level it studies the dynamics of triples and vocabulary terms across different versions of an RDF dataset, whereas at the high-level it measures how those low-level changes translate into updates to the entities described in the experimental datasets

  • While this still leaves us with Ostrich [75], Quit Store [8], R&WBase [79], R43ples [33] and x-RDF3X as testable solutions, only [75] was able to run on our experimental datasets

Read more

Summary

Introduction

The amount of RDF data has steadily grown since the conception of the Semantic Web in 2001 [13], as more and more organizations opt for RDF [66] as the format to publish and manage semantic data [39,41]. In this case the metadata associated to the actual triples is used to answer domain-specific requirements Despite this plethora of work, there is currently no available fully-fledged solution for the management of large and dynamic RDF datasets. This situation originates from multiple factors such as (i) the performance and functionality limitations of RDF engines to handle metadata, (ii) the absence of a standard for querying RDF archives, and (iii) a disregard of the actual evolution of real RDF data.

RDF graphs
RDF graph archives
RDF dataset archives
SPARQL
Queries on archives
Framework for the evolution of RDF data
Low-level changes
High-level changes
Evolution analysis of RDF datasets
Low-level evolution analysis
High-level evolution analysis
Conclusion
Survey of RDF archiving solutions
RDF archiving systems
Change-based systems Solutions based on the CB paradigm store a subset
Languages to query RDF archives
Benchmarks and tools for RDF archives
Evaluation of the related work
Functionality analysis
Performance analysis
Towards fully-fledged RDF archiving
Functionalities
Challenges
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call