DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering

Shuohua Zhou,Yanping Zhang

doi:10.3390/app112311251

Shuohua Zhou, Yanping Zhang

Open Access

https://doi.org/10.3390/app112311251

Copy DOI

Journal: Applied Sciences	Publication Date: Nov 26, 2021
Citations: 9	License type: CC BY 4.0

Affiliation: Gonzaga University, King's College London

Abstract

With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges.

Highlights

Introduction and BackgroundIn recent years, human diseases and healthcare have received extensive attention
Since medical question answering (QA) systems based on natural language processing (NLP) play a critical role to improve the quality of current health care systems, developing accurate and robust medical QA models has become a research priority [7]
We propose a model for medical question answering based on BERT, Generative Pre-trained Transformer 2 (GPT-2) [19], and T5-Small [20], three latest variants of the transformer architecture [21]

Summary

Introduction

Introduction and BackgroundIn recent years, human diseases and healthcare have received extensive attention. Since medical question answering (QA) systems based on natural language processing (NLP) play a critical role to improve the quality of current health care systems, developing accurate and robust medical QA models has become a research priority [7]

Methods

Results

Conclusion