Abstract

The ever accelerating pace of biomedical research results in corresponding acceleration in the volume of biomedical literature created. Since new research builds upon existing knowledge, the rate of increase in the available knowledge encoded in biomedical literature makes the easy access to that implicit knowledge more vital over time. Toward the goal of making implicit knowledge in the biomedical literature easily accessible to biomedical researchers, we introduce a question answering system called Bio-AnswerFinder. Bio-AnswerFinder uses a weighted-relaxed word mover's distance based similarity on word/phrase embeddings learned from PubMed abstracts to rank answers after question focus entity type filtering. Our approach retrieves relevant documents iteratively via enhanced keyword queries from a traditional search engine. To improve document retrieval performance, we introduced a supervised long short term memory neural network to select keywords from the question to facilitate iterative keyword search. Our unsupervised baseline system achieves a mean reciprocal rank score of 0.46 and Precision@1 of 0.32 on 936 questions from BioASQ. The answer sentences are further ranked by a fine-tuned bidirectional encoder representation from transformers (BERT) classifier trained using 100 answer candidate sentences per question for 492 BioASQ questions. To test ranking performance, we report a blind test on 100 questions that three independent annotators scored. These experts preferred BERT based reranking with 7% improvement on MRR and 13% improvement on Precision@1 scores on average.

Highlights

  • While traditional information retrieval (IR) techniques employed by most search engines allow retrieval of documents deemed to be relevant to the keyword provided by a user, question answering systems provide precise answers to natural language questions

  • Four types of questions are provided by BioASQ; (1) “Yes/No” questions such as “Is miR-21 related to carcinogenesis?”, (2) Single factoid questions such as “which is the most common disease attributed to the malfunction or absence of primary cilia?”, (3) list factoid question such as “which human genes are more commonly related to craniosynostosis?” and (4) summary questions such as “what is the mechanism of action of abiraterone?.”

  • The answer reranking deep neural network (DNN) relies on the supervised data from manually selected answers from the weighted RWMD ranked sentences

Read more

Summary

Introduction

While traditional information retrieval (IR) techniques employed by most search engines allow retrieval of documents deemed to be relevant to the keyword provided by a user, question answering systems provide precise answers to natural language questions. Natural language questions together with precise answers allow more elaborate, nuanced and direct inquiries into the ever expanding body. BioASQ, an EU-funded biomedical semantic indexing and question answering challenge (2) provides accumulated sets of biomedical question/gold standard answer data each year since the inception of the challenge in 2013. We obtained the BioASQ 2017 Task 5b question answering training/development dataset for our system development and evaluation. A sentence provides the context around the factoid/list answer to interpret the validity of the answer

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.