Enhancing Answer Selection via Ad-Hoc Knowledge Extraction from Unstructured Web Texts

Shengwei Gu,Xiangfeng Luo,Hao Wang

doi:10.1142/s0218194023500201

Abstract

Answer selection aims to identify the most relevant answers to a given question from a set of candidates. It is the fundamental component of intelligent question answering system. To improve performance, it gradually becomes an effective strategy to integrate external structured knowledge bases (KBs) into the answer selection model. Due to expensive cost of construction and maintenance of such KBs, these models are suffering from domain barriers and information incompleteness. In this paper, we propose a two-stage extraction–comprehension answer selection model, which can extract ad-hoc knowledge from unstructured web texts to enhance the performance of answer selection. For the extraction, two types of snippets are extracted from unstructured web pages and utilized as the source of ad-hoc knowledge. For the comprehension, a selective attention mechanism is employed to extract and integrate ad-hoc knowledge from multiple text snippets obtained in the first stage, which can enrich the representation of question–answer pairs and more accurately identify the correct answers. By incorporating ad-hoc knowledge extracted from both types of snippets, the proposed model achieves state-of-the-art results on two public available benchmark datasets. In particular, on WikiQA, in terms of the two evaluation metrics (mean average precision and mean reciprocal rank), it achieves 9.9[Formula: see text] and 8.4[Formula: see text] higher than the previous non-pretraining-based models, and 3.4[Formula: see text] and 3.2[Formula: see text] higher than the pretraining-based models.

Full Text