When is the Peak Performance Reached? An Analysis of RDF Triple Stores

Hashim Khan,Muhammad Saleem,Axel-Cyrille Ngonga Ngomo,Manzoor Ali

doi:10.3233/ssw210042

Abstract

With significant growth in RDF datasets, application developers demand online availability of these datasets to meet the end users’ expectations. Various interfaces are available for querying RDF data using SPARQL query language. Studies show that SPARQL end-points may provide high query runtime performance at the cost of low availability. For example, it has been observed that only 32.2% of public endpoints have a monthly uptime of 99–100%. One possible reason for this low availability is the high workload experienced by these SPARQL endpoints. As complete query execution is performed at server side (i.e., SPARQL endpoint), this high query processing workload may result in performance degradation or even a service shutdown. We performed extensive experiments to show the query processing capabilities of well-known triple stores by using their SPARQL endpoints. In particular, we stressed these triple stores with multiple parallel requests from different querying agents. Our experiments revealed the maximum query processing capabilities of these triple stores after which point they lead to service shutdowns. We hope this analysis will help triple store developers to design workload-aware RDF engines to improve the availability of their public endpoints with high throughput.

Highlights

One of the basic requirements of many semantic web applications is the ability to access and query live linked data
We want to look for the key findings pertaining to the following research questions: (1) Which triple store achieved the highest throughput in terms of Queries per Second (QpS)? (2) On avg., which triple store is performing the best? (3) What is the peak performance point of each of the selected triple stores and when is it achieved? (4) How do the triple stores scale to the increasing number of parallel querying agents? (5) At which point does the Denial of Service (DoS) occur? and (6) How do systems scale with the increasing dataset sizes? In the following, we discuss each of these key questions24
Denial of Service (DoS): Our results show that the throughput of the selected triple stores almost reaches zero when exposed to 128 querying agents

Summary

Introduction

One of the basic requirements of many semantic web applications is the ability to access and query live linked data. The term “live queryable” linked data demands that the data should be queryable via online SPARQL interfaces (without first downloading the entire knowledge graph) and processed locally to retrieve the desired information [27]. It is one of the most important demands for the successful deployment of many linked data-based applications. Various interfaces such as SPARQL endpoints and Triple Pattern Fragments (TPF) provide live SPARQL querying [27]. Khan et al / When is the Peak Performance Reached? An Analysis of RDF Triple Stores 155

Objectives

Results

Conclusion