Abstract

Semantically enhanced information retrieval (IR) is aimed at improving classical IR methods and goes way beyond plain Boolean keyword matching with the main goal of better serving implicit and ambiguous information needs. As a de-facto pre-requisite to semantic IR, different information extraction (IE) techniques are used to mine unstructured text for underlying knowledge. In this paper we present a method that combines both IE and IR to enable semantic search in natural language texts. First, we apply semantic role labeling (SRL) to automatically extract event-oriented information found in natural language texts to an RDF knowledge graph leveraging semantic web technology. Second, we investigate how a custom flavored graph traversal spreading activation algorithm can be employed to interpret user’s information needs on top of the prior-extracted knowledge base. Finally, we present an assessment on the applicability of our method for semantically enhanced IR. An experimental evaluation on partial WikiQA dataset shows the strengths of our approach and also unveils common pitfalls that we use as guidelines to draw further work directions in the open-domain semantic search field.

Highlights

  • In the context of traditional web search, information retrieval (IR) has been known as a task of obtaining documents relevant to user’s information needs, typically expressed by a form of a query

  • An evaluation of the proposed semantically enhanced IR method was conducted by firstly applying our semantic role labeling (SRL) Triple Extraction information extraction (IE) component on WikiQA [30] full dataset, and secondly by using corresponding query set to see how spreading activation algorithm behaves on top of the extracted Resource Description Framework (RDF) knowledge graph

  • The SRL annotator skips predicate “be.01” making it a drawback when dealing with factoid-like questions in WikiQA dataset as the required triples do not get asserted in the knowledge base during information extraction

Read more

Summary

Introduction

In the context of traditional web search, information retrieval (IR) has been known as a task of obtaining documents relevant to user’s information needs, typically expressed by a form of a query. As the target search space increases, more focus should be directed towards effective document content processing in order to distinguish between new and repeated knowledge sources We encounter another paradigm known as information extraction (IE). P2: [A0: YouTube] [V: operate.01] [A3: as a subsidiary of Google] Such structure represents shallow semantics of a sentence where each of the predicates is accompanied by its main (A0, A1, A2) and adjunctive arguments (AMTMP, AM-LOC, AM- MN). Since the natural ambiguity behind user’s information needs and information sources cannot be covered by solely relying on shallow predicate argument structures, deep semantic analysis of the resulting arguments is necessary to be carried out. The extracted knowledge is serialized using Resource Description Framework (RDF) resulting in a directed labeled knowledge graph Such representation further allows treating query execution as a graph traversal task.

Related Work
SRL Triple Ontology
Interpreting User’s Information Needs
Experimental Evaluation
Dataset Pre-processing
SRL Triple Extraction
Information Extraction Results
Information Retrieval Results
Comparison with TF-IDF
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.