Abstract

Motivation: Entity linking is the task of linking entity mentions to the database entries corresponding to the entity mentions. Entity linking enables the treatment of superficially different but semantically identical mentions as the same entity. Since millions of concepts are listed in biomedical databases, selecting the correct database entry for each targeted entity is challenging. Simple string matching between the word and each synonym in biomedical databases is insufficient to handle a wide variety of variants of biomedical entities appearing in the biomedical literature. Recent progress in neural approaches is promising for entity linking. Still, existing neural methods require sufficient data, which is difficult to prepare in biomedical entity linking that deals with millions of biomedical concepts. Therefore, we need to develop a new neural method to train entity-linking models over the sparse training data covering a very limited part of the biomedical concepts. Results:We have devised a pure neural model that classifies biomedical entity mentions into millions of biomedical concepts. The classifier employs (1) the layer overwriting that breaks through the performance ceiling during training, (2) training data augmentation using database entries that compensate for the problem of insufficient training data, and (3) the cosine similarity-based loss function that helps distinguish the millions of biomedical concepts. Our system using the proposed classifier was ranked first in the official run of the National NLP Clinical Challenges (n2c2) 2019 Track 3, which targeted linking medical/clinical entity mentions to 434,056 Concept Unique Identifier (CUI) entries. We also applied our system to the MedMentions dataset, which has 3.2M candidate concepts. Experimental results confirmed the same advantages of our proposed method. We further evaluated our system on the NLM-CHEM corpus with 350K candidate concepts, and our system achieved a new state-of-the-art performance on the corpus.Availability:https://github.com/tti-coin/bio-linkingContact:makoto.miwa@toyota-ti.ac.jp

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call