Collocation analysis for UMLS knowledge-based word sense disambiguation.

Antonio Jimeno-Yepes,Bridget T Mcinnes,Alan R Aronson

doi:10.1186/1471-2105-12-s3-s4

Antonio Jimeno-Yepes, Bridget T Mcinnes + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-12-s3-s4

Copy DOI

Journal: BMC bioinformatics	Publication Date: Jun 9, 2011
Citations: 34	License type: CC BY 2.0

Affiliation: United States National Library of Medicine

Abstract

BackgroundThe effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results.MethodsWe analyzed some collocation types which could improve the performance of knowledge-based disambiguation methods. Collocations are obtained by extracting candidate collocations from MEDLINE and then assigning them to one of the senses of an ambiguous word. We performed this assignment either using semantic group profiles or a knowledge-based disambiguation method. In addition to collocations, we used second-order features from a previously implemented approach.Specifically, we measured the effect of these collocations in two knowledge-based WSD methods. The first method, AEC, uses the knowledge from the UMLS to collect examples from MEDLINE which are used to train a Naïve Bayes approach. The second method, MRD, builds a profile for each candidate sense based on the UMLS and compares the profile to the context of the ambiguous word.We have used two WSD test sets which contain disambiguation cases which are mapped to UMLS concepts. The first one, the NLM WSD set, was developed manually by several domain experts and contains words with high frequency occurrence in MEDLINE. The second one, the MSH WSD set, was developed automatically using the MeSH indexing in MEDLINE. It contains a larger set of words and covers a larger number of UMLS semantic types.ResultsThe results indicate an improvement after the use of collocations, although the approaches have different performance depending on the data set. In the NLM WSD set, the improvement is larger for the MRD disambiguation method using second-order features. Assignment of collocations to a candidate sense based on UMLS semantic group profiles is more effective in the AEC method.In the MSH WSD set, the increment in performance is modest for all the methods. Collocations combined with the MRD disambiguation method have the best performance. The MRD disambiguation method and second-order features provide an insignificant change in performance. The AEC disambiguation method gives a modest improvement in performance. Assignment of collocations to a candidate sense based on knowledge-based methods has better performance.ConclusionsCollocations improve the performance of knowledge-based disambiguation methods, although results vary depending on the test set and method used. Generally, the AEC method is sensitive to query drift. Using AEC, just a few selected terms provide a large improvement in disambiguation performance. The MRD method handles noisy terms better but requires a larger set of terms to improve performance.

Highlights

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words
We find that the semantic group profiles used to assign collocations to candidate senses obtain a high accuracy in the National Library of Medicine (NLM) word sense disambiguation (WSD) set, but add noise to the MSH WSD set
We find that the semantic group approach works reasonably well on the NLM WSD set but decreases in performance on the MSH WSD set, but the contrary is true for the Automatic Extracted Corpus (AEC) categorization

Summary

Introduction

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. The word cold could either refer to low temperature or the viral infection Existing knowledge sources, such as the Unified Medical Language System (UMLS)® [1,2], are used to annotate terms in text. The effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. Concepts are assigned a unique identifier (CUI) which has linked to it a set of terms that denotes alternative ways to represent the concept in text These terms, depending on the availability, are represented in several languages, only English terms are used in this work. All the information about a concept can be traced back to the resource from where it was collected

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Collocation analysis for UMLS knowledge-based word sense disambiguation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification
Vijay N Garla ... Cynthia Brandt
Journal of the American Medical Informatics Association : JAMIA | VOL. 20
Vijay N Garla, et. al.Vijay N Garla ... Cynthia Brandt
01 Sep 2013
Journal of the American Medical Informatics Association : JAMIA | VOL. 20

Knowledge-based and knowledge-lean methods combined in unsupervised word sense disambiguation
Antonio Jimeno Yepes ... Alan R Aronson
-
Antonio Jimeno Yepes, et. al.Antonio Jimeno Yepes ... Alan R Aronson
28 Jan 2012
28 Jan 2012

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation
Antonio J Jimeno-Yepes ... Alan R Aronson
BMC bioinformatics | VOL. 12
Antonio J Jimeno-Yepes, et. al.Antonio J Jimeno-Yepes ... Alan R Aronson
02 Jun 2011
BMC bioinformatics | VOL. 12

Knowledge-Based Biomedical Word Sense Disambiguation: An Evaluation and Application to Clinical Document Classification
Vijay N Garla ... Cynthia Brandt
-
Vijay N Garla, et. al.Vijay N Garla ... Cynthia Brandt
01 Sep 2012
01 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Collocation analysis for UMLS knowledge-based word sense disambiguation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics