Towards fully-fledged archiving for RDF datasets

Olivier Pelgrin,Katja Hose,Luis Galárraga,Muhammad Saleem,Axel-Cyrille Ngonga Ngomo,Muhammad Intizar Ali,Olaf Hartig,Ruben Verborgh

doi:10.3233/sw-210434

Olivier Pelgrin, Katja Hose + Show 6 more

Open Access

https://doi.org/10.3233/sw-210434

Copy DOI

Abstract

The dynamicity of RDF data has motivated the development of solutions for archiving, i.e., the task of storing and querying previous versions of an RDF dataset. Querying the history of a dataset finds applications in data maintenance and analytics. Notwithstanding the value of RDF archiving, the state of the art in this field is under-developed: (i) most existing systems are neither scalable nor easy to use, (ii) there is no standard way to query RDF archives, and (iii) solutions do not exploit the evolution patterns of real RDF data. On these grounds, this paper surveys the existing works in RDF archiving in order to characterize the gap between the state of the art and a fully-fledged solution. It also provides RDFev, a framework to study the dynamicity of RDF data. We use RDFev to study the evolution of YAGO, DBpedia, and Wikidata, three dynamic and prominent datasets on the Semantic Web. These insights set the ground for the sketch of a fully-fledged archiving solution for RDF data.

Highlights

The amount of RDF data has steadily grown since the conception of the Semantic Web in 2001 [13], as more and more organizations opt for RDF [66] as the format to publish and manage semantic data [39,41]
We have conducted a study of the evolution of three large RDF knowledge bases using our proposed framework RDFev, which resorts to a domain-agnostic analysis from two perspectives: At the low-level it studies the dynamics of triples and vocabulary terms across different versions of an RDF dataset, whereas at the high-level it measures how those low-level changes translate into updates to the entities described in the experimental datasets
While this still leaves us with Ostrich [75], Quit Store [8], R&WBase [79], R43ples [33] and x-RDF3X as testable solutions, only [75] was able to run on our experimental datasets

Summary

Introduction

The amount of RDF data has steadily grown since the conception of the Semantic Web in 2001 [13], as more and more organizations opt for RDF [66] as the format to publish and manage semantic data [39,41]. In this case the metadata associated to the actual triples is used to answer domain-specific requirements Despite this plethora of work, there is currently no available fully-fledged solution for the management of large and dynamic RDF datasets. This situation originates from multiple factors such as (i) the performance and functionality limitations of RDF engines to handle metadata, (ii) the absence of a standard for querying RDF archives, and (iii) a disregard of the actual evolution of real RDF data.

RDF graphs

RDF graph archives

RDF dataset archives

SPARQL

Queries on archives

Framework for the evolution of RDF data

Low-level changes

High-level changes

Evolution analysis of RDF datasets

Low-level evolution analysis

High-level evolution analysis

Conclusion

Survey of RDF archiving solutions

RDF archiving systems

Change-based systems Solutions based on the CB paradigm store a subset

Languages to query RDF archives

Benchmarks and tools for RDF archives

Evaluation of the related work

Functionality analysis

Performance analysis

Towards fully-fledged RDF archiving

Functionalities

Challenges

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Semantic Web	Publication Date: Oct 4, 2021
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Towards fully-fledged archiving for RDF datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Semantic Web

Lead the way for us

Similar Papers

A Blocking Scheme for Entity Resolution in the Semantic Web
Gustavo De Assis Costa ... Jose Maria Parente De Oliveira
-
Gustavo De Assis Costa, et. al.Gustavo De Assis Costa ... Jose Maria Parente De Oliveira
01 Mar 2016
01 Mar 2016

Federated Query Processing for the Semantic Web
C Buil-Aranda
-
C Buil-ArandaC Buil-Aranda
01 Jan 2014
01 Jan 2014

Big RDF Data Storage, Computation, and Analysis: A Strawman's Arguments
Pingpeng Yuan ... Longlong Lin
-
Pingpeng Yuan, et. al.Pingpeng Yuan ... Longlong Lin
01 Jul 2019
01 Jul 2019

Dynamic and fast processing of queries on large-scale RDF data
Pingpeng Yuan ... Hai Jin
Knowledge and information systems | VOL. 41
Pingpeng Yuan, et. al.Pingpeng Yuan ... Hai Jin
19 Jan 2014
Knowledge and information systems | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards fully-fledged archiving for RDF datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Semantic Web