Knowledge-based biomedical word sense disambiguation: comparison of approaches

Antonio J Jimeno-Yepes,Alan R Aronson

doi:10.1186/1471-2105-11-569

Antonio J Jimeno-Yepes, Alan R Aronson

Open Access

https://doi.org/10.1186/1471-2105-11-569

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Nov 22, 2010
Citations: 83	License type: CC BY 2.0

Affiliation: United States National Library of Medicine

Abstract

BackgroundWord sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.MethodsWe present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.ConclusionsWe have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.

Highlights

Results using the NLM Word sense disambiguation (WSD) data set are presented in terms of accuracy, defined in equation 3, where an instance is an example of an ambiguous word to disambiguate
One baseline is maximum frequency sense (MFS), which is standard in WSD evaluation
We find that the Journal Descriptor Indexing (JDI) method has the best performance for a single method

Summary

Introduction

Proceedings of the ACL08: HLT Student Research Workshop, Columbus, Ohio: Association for Computational Linguistics 2008, 49-54[http://www.aclweb.org/anthology/P/ P08/P08-3009]. Vasilescu F, Langlais P, Lapalme G: Evaluating variants of the Lesk approach for disambiguating words. Proceedings of the Conference of Language Resources and Evaluations (LREC 2004) 2004, 633-636. Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Introduction Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Improvement in WSD will help, for instance, to produce better annotation tools like MetaMap [1], improve automatic indexing [2] and other text mining tasks. The examples are sentences from MEDLINE® citations, which is the largest bibliographic database in the biomedical domain with citations from around 5,000 journals, with their PUBMED® identifiers (PMIDs)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus
J.J Cimino ... Y Perl
Journal of Biomedical Informatics | VOL. 36
J.J Cimino, et. al.J.J Cimino ... Y Perl
01 Dec 2003
Journal of Biomedical Informatics | VOL. 36

Auditing the Assignments of Top-Level Semantic Types in the UMLS Semantic Network to UMLS Concepts.
Zhe He ... Yehoshua Perl
Proceedings. IEEE International Conference on Bioinformatics and Biomedicine | VOL. 2017
Zhe He, et. al.Zhe He ... Yehoshua Perl
01 Nov 2017
Proceedings. IEEE International Conference on Bioinformatics and Biomedicine | VOL. 2017

Multiethnic Prediction of Nicotine Biomarkers and Association With Nicotine Dependence.
Andrew W Bergen ... Stephen Mcgee
Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco | VOL. 23
Andrew W Bergen, et. al.Andrew W Bergen ... Stephen Mcgee
27 Jul 2021
Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco | VOL. 23

Incidence and Complications of Atrial Fibrillation in a Low Socioeconomic and High Disability United States (US) Population: A Combined Statistical and Machine Learning Approach.
Gregory Y H Lip ... Zhaohui Liang
International journal of clinical practice | VOL. 2022
Gregory Y H Lip, et. al.Gregory Y H Lip ... Zhaohui Liang
30 Aug 2022
International journal of clinical practice | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics