Abstract

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

Highlights

  • Question answering (QA) is a fundamental task in Natural Language Processing (NLP), which requires models to answer a particular question

  • Models are required to find and extract relevant information to questions from large-scale text sources such as a search engine [9] and Wikipedia [10]. This type of task is generally called as open-domain question answering (OpenQA), which has recently attracted lots of attention from the natural language processing (NLP) community [11,12,13] but still remains far from being solved

  • Most previous works for OpenQA focus on datasets in which answers are in the format of spans and can be found based on the information explicitly expressed in the provided text [9,10,14,15]

Read more

Summary

Introduction

Question answering (QA) is a fundamental task in Natural Language Processing (NLP), which requires models to answer a particular question. Real-world scenarios for QA are usually much more complex and one may not have a body of text already labeled as containing the answer to the question In this scenario, models are required to find and extract relevant information to questions from large-scale text sources such as a search engine [9] and Wikipedia [10]. As a more challenging task, freeform multiple-choice OpenQA datasets such as ARC [16] and OpenBookQA [17] contain a significant percentage of questions focusing on the implicitly expressed facts, events, opinions, or emotions in the retrieved text To answer these questions, models need to perform logical reasoning over the information presented in the retrieved text and in some cases even need to integrate some prior knowledge. These OpenQA datasets consist of questions that require only elementary or middle school level knowledge (e.g., “Which object would let the most heat travel through?”), so even excellent models trained on them may be unable to support more sophisticated real-world scenarios

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.