List-wise learning to rank biomedical question-answer pairs with deep ranking recursive autoencoders.

Yan Yan,Zhenhan Liu,Xu-Feng Li,Bo-Wen Zhang

doi:10.1371/journal.pone.0242061

Yan Yan, Zhenhan Liu + Show 2 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0242061

Copy DOI

Export

Save

Cite

Journal: PloS one	Publication Date: Nov 9, 2020
Citations: 5	License type: CC BY 4.0

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Biomedical question answering (QA) represents a growing concern among industry and academia due to the crucial impact of biomedical information. When mapping and ranking candidate snippet answers within relevant literature, current QA systems typically refer to information retrieval (IR) techniques: specifically, query processing approaches and ranking models. However, these IR-based approaches are insufficient to consider both syntactic and semantic relatedness and thus cannot formulate accurate natural language answers. Recently, deep learning approaches have become well-known for learning optimal semantic feature representations in natural language processing tasks. In this paper, we present a deep ranking recursive autoencoders (rankingRAE) architecture for ranking question-candidate snippet answer pairs (Q-S) to obtain the most relevant candidate answers for biomedical questions extracted from the potentially relevant documents. In particular, we convert the task of ranking candidate answers to several simultaneous binary classification tasks for determining whether a question and a candidate answer are relevant. The compositional words and their random initialized vectors of concatenated Q-S pairs are fed into recursive autoencoders to learn the optimal semantic representations in an unsupervised way, and their semantic relatedness is classified through supervised learning. Unlike several existing methods to directly choose the top-K candidates with highest probabilities, we take the influence of different ranking results into consideration. Consequently, we define a listwise "ranking error" for loss function computation to penalize inappropriate answer ranking for each question and to eliminate their influence. The proposed architecture is evaluated with respect to the BioASQ 2013-2018 Six-year Biomedical Question Answering benchmarks. Compared with classical IR models, other deep representation models, as well as some state-of-the-art systems for these tasks, the experimental results demonstrate the robustness and effectiveness of rankingRAE.

Highlights

Due to the continuous growth of information produced in the biomedical domain, there is a substantially growing demand for biomedical question answering (QA) from the general public, medical students, health care professionals and biomedical researchers [1]
The results show that our proposed approach outperforms several competitive baselines, including the classical information retrieval (IR) models and the proposed model with replaced vector representations, e.g., Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM) and state-of-the-art BioASQ participants
We present the performances of our proposed approach https://github.com/lixuf/RAERecursive-AutoEncoder-for-bioasq-taskB-phaseA-snippets-retrieve- and the variants that replace the vector representation model with self-implemented Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Long Short-term Memory (LSTM), and the recursive autoencoders (RAE) without the use of ranking error (RAE)

Summary

Introduction

Due to the continuous growth of information produced in the biomedical domain, there is a substantially growing demand for biomedical QA from the general public, medical students, health care professionals and biomedical researchers [1]. One is a study of BioASQ participants that builds a model with a granularity of several random words and calculates a ranking of the subdocument level through a document retrieval model [20] Another one is a study of BioNLP participants utilizing encoder technology to measure the relationship between questions and answers [21]. With the semantic vectors of Q-S pairs and supervised learning, the probabilities of Q-A relations are computed and ranked to select relevant snippet answers. The main contributions are: 1) proposing a novel approach to solve the snippet retrieval problem in biomedical QA with a classification model; 2) redesigning the loss function of RNNs to orient ranking; and 3) providing a better solution for BioASQ. The extraction of snippets is a great challenge of snippets retrieval

Related work

Experiments

Results

Conclusion