Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System

Dennis Dosso,Gianmaria Silvello

doi:10.1109/access.2020.2966823

Abstract

Keyword-based access to structured data has been gaining traction both in research and industry as a means to facilitate access to information. In recent years, the research community and big data technology vendors have put much effort into developing new approaches for keyword search over structured data. Accessing these data through structured query languages, such as SQL or SPARQL, can be hard for end-users accustomed to Web-based search systems. To overcome this issue, keyword search in databases is becoming the technology of choice, although its efficiency and effectiveness problems still prevent a large scale diffusion. In this work, we focus on graph data, and we propose the TSA+BM25 and the TSA+VDP keyword search systems over RDF datasets based on the “virtual documents” approach. This approach enables high scalability because it moves most of the computational complexity off-line and then exploits highly efficient text retrieval techniques and data structures to carry out the on-line phase. Nevertheless, text retrieval techniques scale well to large datasets but need to be adapted to the complexity of structured data. The new approaches we propose are more efficient and effective compared to state-of-the-art systems. In particular, we show that our systems scale to work with RDF datasets composed of hundreds of millions of triples and obtain competitive results in terms of effectiveness.

Highlights

In the last decade, the Web of Data emerged as one of the principal means to access, share, and re-use structured data on the Web [1]
We propose two new keyword search systems over Resource Description Framework (RDF) graphs based on the virtual document approach and following the best-match search paradigm – i.e., the Topological Syntactical Aggregator+BM25 (TSA+BM25) and Topological Syntactical Aggregator+Virtual Documents Pruning (TSA+Virtual Document Pruning (VDP))
An immediate consequence of this implies that, while within the Cranfield framework the ground truth Ground Truth (GT) is usually manually built by human assessors that establish whether a document is relevant for a given topic, in keyword search over structured data this is not possible since the documents to be judged are created on the fly and vary with the query and the search system employed

Summary

Introduction

The Web of Data emerged as one of the principal means to access, share, and re-use structured data on the Web [1]. RDF is widely-used for data publishing, accessing, and sharing since it allows flexible manipulation, enrichment, and discovery of data as well as encouraging interoperability. We have seen a significant increase in the number of knowledge bases published as large RDF graphs, such as DBpedia, Wikidata, and OpenPHACTS.. Enterprise data with heterogeneous and very large graphs, often containing millions of edges [2]. Efficient and effective techniques to retrieve and access these data are of paramount importance to allow end-users to discover, consult, re-use, and share these resources We have seen a significant increase in the number of knowledge bases published as large RDF graphs, such as DBpedia, Wikidata, and OpenPHACTS. The use of RDF is growing even for the representation and management of

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

STRUCT
Rajvardhan Patil ... Zhengxin Chen
-
Rajvardhan Patil, et. al.Rajvardhan Patil ... Zhengxin Chen
20 May 2012
20 May 2012

CI-Rank: Ranking Keyword Search Results Based on Collective Importance
Xiaohui Yu ... Huxia Shi
-
Xiaohui Yu, et. al.Xiaohui Yu ... Huxia Shi
01 Apr 2012
01 Apr 2012

A Scalable Virtual Document-Based Keyword Search System for RDF Datasets
Dennis Dosso ... Gianmaria Silvello
-
Dennis Dosso, et. al.Dennis Dosso ... Gianmaria Silvello
18 Jul 2019
18 Jul 2019

End-to-End ASR-Free Keyword Search From Speech
Kartik Audhkhasi ... Brian Kingsbury
IEEE Journal of Selected Topics in Signal Processing | VOL. 11
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Brian Kingsbury
13 Jan 2017
IEEE Journal of Selected Topics in Signal Processing | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access