Abstract
There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool – Medical Text Indexer, and the winner of BioASQ Task5a challenge – DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ∼ 0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.
Highlights
MEDLINE is a database containing more than 24 million biomedical journal citations by 20181.PubMed provides free access to MEDLINE for worldwide researchers
Selecting relevant Medical Subject Headings (MeSH) terms from neighbor articles can be regarded as a weak classifier itself, and high-recall setting is favored in this step
Since other models like DeepMeSH and Medical Text Indexer (MTI) don’t report how to interpret their model outputs, we use string-matching as a non-trivial baseline
Summary
MEDLINE is a database containing more than 24 million biomedical journal citations by 20181.PubMed provides free access to MEDLINE for worldwide researchers. To facilitate information storage and retrieval, curators at the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH2 is a hierarchically-organized terminology developed by NLM for indexing and cataloging biomedical texts like MEDLINE articles. Indexers examine the full article and annotate it with MeSH terms according to rules set by NLM4. Its estimated that indexing an article costs $9.4 on average (Mork et al, 2013), and there are more than 813,500 citations added to MEDLINE in 20175. Indexing all citations manually would cost several million dollars in one year. Several automatic annotation models have been developed to improve the time-consuming and financially expensive manual annotation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.