A survey and experimental comparison of distributed SPARQL engines for very large RDF data

Ibrahim Abdelaziz,Panos Kalnis,Razen Harbi,Zuhair Khayyat

doi:10.14778/3151106.3151109

Abstract

Distributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 22 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing and categorize them by several characteristics. Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.3 billion triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the VLDB Endowment	Publication Date: Sep 1, 2017
Citations: 80	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Similar Papers

Scalable Keyword Search on Large RDF Data
Wangchao Le ... Songyun Duan
IEEE Transactions on Knowledge and Data Engineering | VOL. 26
Wangchao Le, et. al.Wangchao Le ... Songyun Duan
01 Nov 2014
IEEE Transactions on Knowledge and Data Engineering | VOL. 26

Partitioned Indexes for Entity Search over RDF Knowledge Bases
Fang Du ... Yueguo Chen
-
Fang Du, et. al.Fang Du ... Yueguo Chen
01 Jan 2012
01 Jan 2012

Efficient SPARQL Query Evaluation via Automatic Data Partitioning
Tao Yang ... Xiaoyan Wang
-
Tao Yang, et. al.Tao Yang ... Xiaoyan Wang
01 Jan 2013
01 Jan 2013

CF-RDF: A Lightweight and Efficient Large Scale RDF Data Management System
Xiaozhe Li ... Guohua Yan
-
Xiaozhe Li, et. al.Xiaozhe Li ... Guohua Yan
01 Dec 2020
01 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment