Concept recognition as a machine translation problem

Mayla R Boguslav,Negacy D Hailu,Michael Bada,William A Baumgartner,Lawrence E Hunter

doi:10.1186/s12859-021-04141-4

Mayla R Boguslav, Negacy D Hailu + Show 3 more

Open Access

https://doi.org/10.1186/s12859-021-04141-4

Copy DOI

Abstract

BackgroundAutomated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches.MethodsWe systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance.ResultsBidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches.ConclusionsMachine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation.

Highlights

Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks
Machine translation is a promising avenue for fully machine-learningbased concept recognition that achieves state-of-the-art results on the Colorado richly annotated full-text (CRAFT) Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task
Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) BioBERT is a biomedical-specific language model pre-trained on biomedical documents from both PubMed abstracts (PubMed) and PubMed Central full-text articles (PMC) based on the original BERT architecture [15]

Summary

Introduction

Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Automated recognition of references to specific ontology concepts from mentions in text (hereafter “concept recognition") is a critical task in biomedical natural language processing (NLP) and has been the subject of many open shared tasks, including BioCreAtIve [1], the BioNLP open shared tasks (BioNLP-OST) [2], and the recent Covid-19 open research dataset challenge [3]. All of these shared tasks provide data, evaluation details, and a community of researchers, making them very useful frameworks for further development of such tasks. Our analysis of the strengths and weaknesses of such systems suggests promising avenues for future improvements as well as design choices that can increase computational efficiency at a small cost in performance

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2021
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Concept recognition as a machine translation problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations
...
-
, et. al. ...
12 May 2022
12 May 2022

Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies.
Ruopeng An ... Jing Shen
Journal of Medical Internet Research | VOL. 24
Ruopeng An, et. al.Ruopeng An ... Jing Shen
07 Dec 2022
Journal of Medical Internet Research | VOL. 24

Unleashing the Power of Machine Learning to Predict Myocardial Recovery After Left Ventricular Assist Device: A Call for the Inclusion of Unstructured Data Sources in Heart Failure Registries.
Ramsey M Wehbe
Circulation. Heart failure | VOL. 15
Ramsey M WehbeRamsey M Wehbe
24 Dec 2021
Circulation. Heart failure | VOL. 15

Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey.
Jamil Zaghir ... Alberto Lavelli
Yearbook of Medical Informatics | VOL. 32
Jamil Zaghir, et. al.Jamil Zaghir ... Alberto Lavelli
01 Aug 2023
Yearbook of Medical Informatics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Concept recognition as a machine translation problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics