Abstract

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Highlights

  • The semantic web allows computers to understand and manipulate the meaning of documents [1,2,3]

  • We propose a distributed SPARQL query processing scheme considering Spark environments to reduce join and communication costs that occurred during query processing, which are problems in existing schemes

  • All Resource Description Frame (RDF) graphs were stored in a single node, and a single node can perform a more efficient query processing scheme if SPARQL queries are processed through replication

Read more

Summary

Introduction

The semantic web allows computers to understand and manipulate the meaning of documents [1,2,3]. A scheme that reduces the disk I/O cost during data parsing and join cost during query processing in the Spark environment, which was a distributed in-memory platform, was proposed [43,44,45]. It does not consider the communication cost during query processing in a distributed environment, resulting in a large amount of cost generated during query processing of a large scale of RDF graph [43]. We propose a distributed SPARQL query processing scheme considering Spark environments to reduce join and communication costs that occurred during query processing, which are problems in existing schemes.

Related Work
The Proposed Distributed SPARQL Query Processing Scheme
Overall Architecture
Query Processing Procedure
Performance Evaluation
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call