Abstract

Keyword-based access to structured data has been gaining traction both in research and industry as a means to facilitate access to information. In recent years, the research community and big data technology vendors have put much effort into developing new approaches for keyword search over structured data. Accessing these data through structured query languages, such as SQL or SPARQL, can be hard for end-users accustomed to Web-based search systems. To overcome this issue, keyword search in databases is becoming the technology of choice, although its efficiency and effectiveness problems still prevent a large scale diffusion. In this work, we focus on graph data, and we propose the TSA+BM25 and the TSA+VDP keyword search systems over RDF datasets based on the “virtual documents” approach. This approach enables high scalability because it moves most of the computational complexity off-line and then exploits highly efficient text retrieval techniques and data structures to carry out the on-line phase. Nevertheless, text retrieval techniques scale well to large datasets but need to be adapted to the complexity of structured data. The new approaches we propose are more efficient and effective compared to state-of-the-art systems. In particular, we show that our systems scale to work with RDF datasets composed of hundreds of millions of triples and obtain competitive results in terms of effectiveness.

Highlights

  • In the last decade, the Web of Data emerged as one of the principal means to access, share, and re-use structured data on the Web [1]

  • We propose two new keyword search systems over Resource Description Framework (RDF) graphs based on the virtual document approach and following the best-match search paradigm – i.e., the Topological Syntactical Aggregator+BM25 (TSA+BM25) and Topological Syntactical Aggregator+Virtual Documents Pruning (TSA+Virtual Document Pruning (VDP))

  • An immediate consequence of this implies that, while within the Cranfield framework the ground truth Ground Truth (GT) is usually manually built by human assessors that establish whether a document is relevant for a given topic, in keyword search over structured data this is not possible since the documents to be judged are created on the fly and vary with the query and the search system employed

Read more

Summary

Introduction

The Web of Data emerged as one of the principal means to access, share, and re-use structured data on the Web [1]. RDF is widely-used for data publishing, accessing, and sharing since it allows flexible manipulation, enrichment, and discovery of data as well as encouraging interoperability. We have seen a significant increase in the number of knowledge bases published as large RDF graphs, such as DBpedia, Wikidata, and OpenPHACTS.. Enterprise data with heterogeneous and very large graphs, often containing millions of edges [2]. Efficient and effective techniques to retrieve and access these data are of paramount importance to allow end-users to discover, consult, re-use, and share these resources We have seen a significant increase in the number of knowledge bases published as large RDF graphs, such as DBpedia, Wikidata, and OpenPHACTS. The use of RDF is growing even for the representation and management of

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.