A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

Mintae Kim,Sangheon Lee,Wooju Kim,Hyunseung Choi,Yeongtaek Oh

doi:10.1109/access.2020.3020245

Abstract

With the proliferation of question and answering (Q&A) services, studies on building a knowledge base (KB) using various information extraction (IE) methodologies from unstructured data on the Web have received significant attention. Existing IE approaches, including machine reading comprehension (MRC), can find the correct answer to a question if the correct answer exists in the document. However, most are prone to extracting incorrect answers rather than producing no answers when the correct answer does not exist in the given documents. This problem is likely to cause serious real-world problems when we apply such technologies to practical services such as AI speakers. We propose a novel open-domain IE system to alleviate the weaknesses of previous approaches. The proposed system integrates an elaborated document selection, sentence selection, and knowledge extraction ensemble method to obtain high specificity while maintaining a realistically achievable level of precision. Based on this framework, we extract answers on Korean open-domain user queries from unstructured documents collected from multiple Web sources. For evaluating our system, we build a benchmark dataset with the SKTelecom AI Speaker log. The baseline models KYLIN infobox generator and BiDAF were used to evaluate the performance of the proposed approach. The experimental results demonstrate that the proposed method outperforms the baseline models and is practically applicable to real-world services.

Highlights

Formal knowledge bases (KBs), such as the Linked Open Data Cloud (LOD) [1] are used to express and share knowledge by connecting and assigning resources on the Web
The KB is a core element used in question and answering (Q&A) service systems and is considered an important research subject in the field of artificial intelligence as a technology storing and searching for answers to a user query
This machine reading comprehension (MRC) might result in poor performance on unstructured documents on the Web because it cannot guarantee that the retrieved document contains correct answers

Summary

Introduction

Formal knowledge bases (KBs), such as the Linked Open Data Cloud (LOD) [1] are used to express and share knowledge by connecting and assigning resources on the Web. The first type requires creating an IE rule by an expert in a specific domain and extract the knowledge when a matching rule pattern is found in the document. The second type requires extracting information based on supervised machine learning and deep learning models. The third type requires the study of machine reading comprehension (MRC). In this case, the information is extracted under the assumption that there is a correct answer in the document, such as in the Stanford Question Answering Dataset (SQuAD) [2]. The information is extracted under the assumption that there is a correct answer in the document, such as in the Stanford Question Answering Dataset (SQuAD) [2] This MRC might result in poor performance on unstructured documents on the Web because it cannot guarantee that the retrieved document contains correct answers

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment
V Vadivu ... N Kavitha
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
V Vadivu, et. al.V Vadivu ... N Kavitha
13 Jul 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Limitations of information extraction methods and techniques for heterogeneous unstructured big data
Kiran Adnan ... Rehan Akbar
International Journal of Engineering Business Management | VOL. 11
Kiran Adnan, et. al.Kiran Adnan ... Rehan Akbar
01 Jan 2019
International Journal of Engineering Business Management | VOL. 11

A Survey of Web Information Extraction Systems
Chia-Hui Chang ... M Kayed
IEEE Transactions on Knowledge and Data Engineering | VOL. 18
Chia-Hui Chang, et. al. Chia-Hui Chang ... M Kayed
01 Oct 2006
IEEE Transactions on Knowledge and Data Engineering | VOL. 18

Exploring Human-Like Behavior Explanation on AI Speaker Recognition as Communicable Partners across Age Groups
Yukiko Nishizaki ... Takumi Uchitani
-
Yukiko Nishizaki, et. al.Yukiko Nishizaki ... Takumi Uchitani
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access