Abstract

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web. Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist. Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems. This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated. We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance. Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources. Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved. We also propose a hot-cold segment identification algorithm to identify regions of high interest. The proposed method is compared with existing popular methods on real RDF datasets. Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results.

Highlights

  • The evolution of the Linked Open Data (LOD) cloud has made a strong wave of research approaches in Big Data [1]

  • We propose an efficient join query algorithm based on the two-step index structure for various SPARQL query types and a hot-cold segment identification algorithm that determines regions of high interest

  • Spurred by efforts such as the LOD project [22], large amounts of semantic data are published in the resource description framework (RDF) format in several diverse fields such as publishing, life sciences, social networking, internet of things (IOT), and healthcare

Read more

Summary

Introduction

The evolution of the Linked Open Data (LOD) cloud has made a strong wave of research approaches in Big Data [1]. The second approach is based on accessing distributed data on the fly using a recursive URI lookup process; we call this the live exploration approach This approach performs queries over multiple SPARQL endpoints offed by publishers for their LOD datasets [4]. This approach has several advantages, such as synchronizing copied data is not required, searching is more dynamic with up-to-date data, and new resources can be added without a time lag for indexing and integrating data.

Overview of Linked Open Data
Hybrid Storage Structure
Related Work
Two-Step SPARQL Query Processing
Performance of Hot-Cold Segment Identification Method
Conclusions and Future Work
Findings
22. Linking Open Data
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call