Abstract

MEDLINE is the primary bibliographic database maintained by National Library of Medicine (NLM). MEDLINE citations are indexed with Medical Subject Headings (MeSH), which is a controlled vocabulary curated by the NLM experts. This greatly facilitates the applications of biomedical research and knowledge discovery. Currently, MeSH indexing is manually performed by human experts. To reduce the time and monetary cost associated with manual annotation, many automatic MeSH indexing systems have been proposed to assist manual annotation, including DeepMeSH and NLM's official model Medical Text Indexer (MTI). However, the existing models usually rely on the intermediate results of other models and suffer from efficiency issues. We propose an end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms. Each MeSH probe enables the model to extract one specific aspect of biomedical knowledge from an input article, thus comprehensive biomedical information can be extracted with different MeSH probes and interpretability can be achieved at word level. MeSH terms are finally recommended with a unified classifier, making MeSHProbeNet both time efficient and space efficient. MeSHProbeNet won the first place in the latest batch of Task A in the 2018 BioASQ challenge. The result on the last test set of the challenge is reported in this paper. Compared with other state-of-the-art models, such as MTI and DeepMeSH, MeSHProbeNet achieves the highest scores in all the F-measures, including Example Based F-Measure, Macro F-Measure, Micro F-Measure, Hierarchical F-Measure and Lowest Common Ancestor F-measure. We also intuitively show how MeSHProbeNet is able to extract comprehensive biomedical knowledge from an input article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call