Abstract

Biomedical experts and bio-curators are unable to quickly find short and precise information using typical search engines as the amount of biomedical literature is increasing exponentially. The research community is focusing on biomedical question answering (QA) systems so that anyone can find precise information nuggets from the massive amount of biomedical literature. Generally, the user queries fall under different categories such as factoid, list, yes/no, or summary. The existing state-of-the-art question answering systems deal with most of these question types. However, the research to improve the performance of individual question types is also on the rise. To improve QA system performance, question classification plays a vital role for factoid and list type questions as it allows the answer processing stage to narrow down the candidate answer space and assigns a higher rank to the correct answers. A single biomedical answer or entity may be associated with more than one biomedical category or semantic type, e.g., Coenzyme Q(10) is classified under two categories in Unified Medical Language System (UMLS): organic chemical and biologically active substance . This inherent characteristic of biomedical entities makes question classification in the biomedical domain a multi-label classification problem, where one question might expect answers belonging to more than one semantic type. To the best of our knowledge, several QA systems deal with question classification as a multi-class classification problem and only one state-of-the-art system – OAQA – deals with it as a multi-label classification problem. In this paper, we analyze the pipeline of the OAQA system for factoid and list type questions, emphasizing the multi-label question classification. We use an improved question classification dataset with the copy transformation technique to improve the performance of list type questions. Moreover, we introduce a binary transformation in the pipeline of factoid questions to increase its performance. Our modified methodology enhances the performance of both list and factoid type questions by a margin of 2% and 3% evaluated on standard $F_{1}$ and Mean Reciprocal Rank measures, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call