An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Jongtae Lim,Kyoungsoo Bok,Byounghoon Kim,Hyeonbyeong Lee,Dojin Choi,Jaesoo Yoo

doi:10.3390/app12010122

Jongtae Lim, Kyoungsoo Bok + Show 4 more

Open Access

https://doi.org/10.3390/app12010122

Copy DOI

Abstract

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Highlights

The semantic web allows computers to understand and manipulate the meaning of documents [1,2,3]
We propose a distributed SPARQL query processing scheme considering Spark environments to reduce join and communication costs that occurred during query processing, which are problems in existing schemes
All Resource Description Frame (RDF) graphs were stored in a single node, and a single node can perform a more efficient query processing scheme if SPARQL queries are processed through replication

Summary

Introduction

The semantic web allows computers to understand and manipulate the meaning of documents [1,2,3]. A scheme that reduces the disk I/O cost during data parsing and join cost during query processing in the Spark environment, which was a distributed in-memory platform, was proposed [43,44,45]. It does not consider the communication cost during query processing in a distributed environment, resulting in a large amount of cost generated during query processing of a large scale of RDF graph [43]. We propose a distributed SPARQL query processing scheme considering Spark environments to reduce join and communication costs that occurred during query processing, which are problems in existing schemes.

Related Work

The Proposed Distributed SPARQL Query Processing Scheme

Overall Architecture

Query Processing Procedure

Performance Evaluation

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Dec 23, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases
Julien Wollbrett ... Pierre Larmande
BMC bioinformatics | VOL. 14
Julien Wollbrett, et. al.Julien Wollbrett ... Pierre Larmande
15 Apr 2013
BMC bioinformatics | VOL. 14

A Two-Phase Method for Optimization of the SPARQL Query
Xiaoqing Lin ... Yuan Li
Journal of sensors | VOL. 2022
Xiaoqing Lin, et. al.Xiaoqing Lin ... Yuan Li
25 Aug 2022
Journal of sensors | VOL. 2022

Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing
Zhi Nie ... Yueguo Chen
-
Zhi Nie, et. al.Zhi Nie ... Yueguo Chen
01 Jan 2012
01 Jan 2012

Processing SPARQL queries with regular expressions in RDF databases
Jinsoo Lee ... Jeong-Hoon Lee
BMC bioinformatics | VOL. 12
Jinsoo Lee, et. al.Jinsoo Lee ... Jeong-Hoon Lee
29 Mar 2011
BMC bioinformatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences