Keyword Search Algorithm over Large RDF Datasets

Yenier Torres Izquierdo

doi:10.1007/978-3-030-34146-6_21

Abstract

Keyword search tools have been used to query RDF data. They can be labeled as schema-based when the RDF schema is used to compile a keyword-based query into a SPARQL query, or graph-based when the RDF dataset is directly traveled or summarized. The approach proposed in the thesis belongs to this latter category. Unlike recent approaches that summarize the RDF graph, the proposed approach explores the similarity between the property domains and ranges and the class instance sets present in the RDF dataset. The approach estimates set similarity using the Jaccard and the set containment measures. To achieve good performance, even for large RDF datasets, the similarity measures are estimated based on k-Minimum hash Values (KMV) synopses [3]. This paper presents the research methodology to implement a keyword search algorithm over large RDF graphs, which does not rely on schema information and uses KMV-synopses. However, the use of KMV-synopses introduces new challenges. So, the research includes the implementation of strategies to efficiently compute KMV-synopses for large RDF datasets and to keep them synchronized when the RDF dataset is up-dated, avoiding full re-computation of the synopses. Finally, the paper presents the status of the research, the open issues and the roadmap to address them.

Full Text