Determining the difficulty of Word Sense Disambiguation

Bridget T Mcinnes,Mark Stevenson

doi:10.1016/j.jbi.2013.09.009

Abstract

Automatic processing of biomedical documents is made difficult by the fact that many of the terms they contain are ambiguous. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities and identify the correct meaning. However, the published literature on WSD systems for biomedical documents report considerable differences in performance for different terms. The development of WSD systems is often expensive with respect to acquiring the necessary training data. It would therefore be useful to be able to predict in advance which terms WSD systems are likely to perform well or badly on.This paper explores various methods for estimating the performance of WSD systems on a wide range of ambiguous biomedical terms (including ambiguous words/phrases and abbreviations). The methods include both supervised and unsupervised approaches. The supervised approaches make use of information from labeled training data while the unsupervised ones rely on the UMLS Metathesaurus. The approaches are evaluated by comparing their predictions about how difficult disambiguation will be for ambiguous terms against the output of two WSD systems. We find the supervised methods are the best predictors of WSD difficulty, but are limited by their dependence on labeled training data. The unsupervised methods all perform well in some situations and can be applied more widely.

Highlights

Word Sense Disambiguation (WSD) is the task of automatically identifying the appropriate sense of an ambiguous word based on the context in which the word is used
The results show that overall the supervised system obtains higher disambiguation accuracies than the unsupervised one, which is consistent with previous results, for example [4,5,6,7]
The number of senses is not a good indicator of supervised WSD accuracy, it is better than the other measures at predicting unsupervised WSD accuracy on the National Library of Medicine (NLM)-WSD and Abbrev datasets

Summary

Introduction

Word Sense Disambiguation (WSD) is the task of automatically identifying the appropriate sense of an ambiguous word based on the context in which the word is used. It is important to determine the accuracy of a WSD system for the ambiguities of interest to get an idea of whether it will be useful for the overall application, and if so, which terms should be disambiguated. Manual annotation is an expensive, difficult and time-consuming process which is not practical to apply on a large scale [13] Some of the methods applied in this paper are supervised since they are based on information derived from a corpus containing examples of the ambiguous term labeled with the correct sense.

Resources

Measures of similarity and relatedness

Previous approaches

Pairwise similarity

Implementation

Example

Evaluation

Word sense disambiguation

WSD performance and corpus statistics

Results for previous approaches

Results for similarity and relatedness measures

Conclusion and future work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: Sep 26, 2013
Citations: 51	License type: cc-by

R Discovery Prime

R Discovery Prime

Determining the difficulty of Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

A Study of the Influence of PoS Tagging on WSD
Lorenza Moreno-Monteagudo ... Patricio Martínez-Barco
-
Lorenza Moreno-Monteagudo, et. al.Lorenza Moreno-Monteagudo ... Patricio Martínez-Barco
01 Jan 2006
01 Jan 2006

Multilingual versus monolingual word sense disambiguation
Radu Ion ... Dan Tufiş
International Journal of Speech Technology | VOL. 12
Radu Ion, et. al.Radu Ion ... Dan Tufiş
01 Sep 2009
International Journal of Speech Technology | VOL. 12

Word sense disambiguation based on context selection using knowledge-based word similarity
Sunjae Kwon ... Youngjoong Ko
Information Processing and Management | VOL. 58
Sunjae Kwon, et. al.Sunjae Kwon ... Youngjoong Ko
04 Mar 2021
Information Processing and Management | VOL. 58

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification
Vijay N Garla ... Cynthia Brandt
Journal of the American Medical Informatics Association : JAMIA | VOL. 20
Vijay N Garla, et. al.Vijay N Garla ... Cynthia Brandt
01 Sep 2013
Journal of the American Medical Informatics Association : JAMIA | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Determining the difficulty of Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics