Abstract

Question Answering based on Knowledge Graphs (KGQA) still faces difficult challenges when transforming natural language (NL) to SPARQL queries. Simple questions only referring to one triple are answerable by most QA systems, but more complex questions requiring complex queries containing subqueries or several functions are still a tough challenge within this field of research. Evaluation results of QA systems therefore also might depend on the benchmark dataset the system has been tested on. For the purpose to give an overview and reveal specific characteristics, we examined currently available KGQA datasets regarding several challenging aspects. This paper presents a detailed look into the datasets and compares them in terms of challenges a KGQA system is facing.

Highlights

  • Question answering (QA) aims at answering questions formulated in natural language on data sources and, combines methods from natural language processing (NLP), linguistics, database processing, and information retrieval.Though early research activities have been already conducted in the sixties, QA has received a great attention again over the last few years

  • Approaches based on semantic knowledge bases, such as RDF knowledge graphs—which we reference as Question Answering on Knowledge Graphs (KGQA) in the following—are a very promising idea because they can rely on large knowledge datasets such as DBpedia and simplify tasks such as mapping and disambiguation

  • With the questions the context is meager and a disambiguation is apparently not successful in many cases. This experiment shows that the disambiguation process should not be considered before creating the SPARQL queries during the QA pipeline

Read more

Summary

Introduction

Question answering (QA) aims at answering questions formulated in natural language on data sources and, combines methods from natural language processing (NLP), linguistics, database processing, and information retrieval. Applications that transform natural language questions to formal queries on structured data can be summarized as the class of Natural Language Interfaces to Databases (NLIDB). Further datasets have been created and published for the purpose to evaluate KGQA systems that transform NL to DBpedia-based SPARQL queries. We present in this work a comparative survey of available datasets for KGQA. The intention of this survey is two-fold:. – provide QA researchers with an overview of existing datasets, their structure and characteristics, and. These datasets are all based on DBpedia of 20162. We analyzed and compare these datasets in view of the following challenges to KGQA systems.

Related Work
LC-QuAD
SimpleDBpediaQA
Overview
Topic Definition
Analysis Description
Result Discussion
Lexical Gap
Complex Queries
Ontology Types
Answer Types
Findings
Discussion & Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call