Abstract
Machine reading comprehension aims to make the computer understand the paragraph semantics and answer the questions raised by users using algorithms. The quality of the dataset used in this task can directly affect the experimental results of the model. In order to enrich the medical domain dataset of machine reading comprehension, this paper constructs MedicalQA, a medical domain dataset for machine reading comprehension, employing a combination of web crawlers and manual annotation techniques. The dataset takes two medical platforms (i.e. Xunyiwenyao Network and 39 Health Network) as main data sources, and includes 19,502 paragraphs and Q & A pairs, covering 9 medical departments, such as internal medicine, surgery, obstetrics and gynecology. The dataset is formatted as an Excel file, organized with 5e columns. The first column denotes the paragraph ID; the second column indicates the department to which the paragraph belongs; the third column contains the paragraph content; the fourth column lists the questions, and the fifth column provides corresponding answers to the questions. The construction of this dataset is conducive to the establishment of machine reading comprehension models in the medical domain, and can also promote the sharing of medical datasets in the field of machine reading comprehension.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.