Knowledge-based Word Research Articles

BackgroundThe effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results.MethodsWe analyzed some collocation types which could improve the performance of knowledge-based disambiguation methods. Collocations are obtained by extracting candidate collocations from MEDLINE and then assigning them to one of the senses of an ambiguous word. We performed this assignment either using semantic group profiles or a knowledge-based disambiguation method. In addition to collocations, we used second-order features from a previously implemented approach.Specifically, we measured the effect of these collocations in two knowledge-based WSD methods. The first method, AEC, uses the knowledge from the UMLS to collect examples from MEDLINE which are used to train a Naïve Bayes approach. The second method, MRD, builds a profile for each candidate sense based on the UMLS and compares the profile to the context of the ambiguous word.We have used two WSD test sets which contain disambiguation cases which are mapped to UMLS concepts. The first one, the NLM WSD set, was developed manually by several domain experts and contains words with high frequency occurrence in MEDLINE. The second one, the MSH WSD set, was developed automatically using the MeSH indexing in MEDLINE. It contains a larger set of words and covers a larger number of UMLS semantic types.ResultsThe results indicate an improvement after the use of collocations, although the approaches have different performance depending on the data set. In the NLM WSD set, the improvement is larger for the MRD disambiguation method using second-order features. Assignment of collocations to a candidate sense based on UMLS semantic group profiles is more effective in the AEC method.In the MSH WSD set, the increment in performance is modest for all the methods. Collocations combined with the MRD disambiguation method have the best performance. The MRD disambiguation method and second-order features provide an insignificant change in performance. The AEC disambiguation method gives a modest improvement in performance. Assignment of collocations to a candidate sense based on knowledge-based methods has better performance.ConclusionsCollocations improve the performance of knowledge-based disambiguation methods, although results vary depending on the test set and method used. Generally, the AEC method is sensitive to query drift. Using AEC, just a few selected terms provide a large improvement in disambiguation performance. The MRD method handles noisy terms better but requires a larger set of terms to improve performance.

Life science researchers and health care professionals rely heavily on biomedical literature databases such as MEDLINE to access information essential for research, health care, education, as well as to keep up with the latest developments in their fields. Providing ways to efficiently access and analyze text information is critical and is becoming more challenging with the increasing volume of publications in the biomedical domain. The last decade has shown an exponential rate of growth of biomedical literature [1]. Natural language processing, a symbiosis of computer science and linguistics disciplines, addresses the computational aspects of automatic text processing. This field offers a fertile ground for machine learning algorithms. The challenges presented when processing natural language offer new opportunities to the existing machine learning methods and promote the development of new ones. The special session of “Machine Learning in Biomedical Literature Analysis and Text Retrieval” was held for the first time as part of the 9th International Conference on Machine Learning and Applications, in Washington DC on December 12-14, 2010. The goal of this session was to present advancements in machine learning techniques that can improve the analysis of biomedical text. In this supplement we present a collection of papers originally presented and published in the proceedings of the International Conference on Machine Learning and Applications (ICMLA 2010). These papers constitute an advance beyond the work originally presented at the conference and have gone through a separate rigorous review process. They represent a wide cross-section of the type of work that goes on in machine learning today, with its focus on biomedical literature. Papers in this supplement touch on multiple existing machine learning methods such as wide margin classifiers and conditional random fields. They suggest novel applications for these methods as well as propose new machine learning techniques, such as novel methods for constructing training data and gold standards. From the literature analysis and text retrieval perspectives this collection of papers covers multiple topics including tokenization, named entity recognition, word-sense disambiguation, sequence labeling, and relationship extraction. Tokenization is typically the first step in natural language processing and is often assumed to be trivial. Unfortunately, it is quite challenging, especially in the biomedical domain. Barrett and Weber-Jahnke [2] present an intriguing scheme for building a tokenizer. Named entity recognition is an important component of text analysis tools. Three papers in the supplement touch on named entity recognition. Yeganova et al. [3] present a method of detecting abbreviations and their definitions in biomedical literature. Islamaj Dogan et al. [4] present an approach that detects with high accuracy clinical problems, treatment and test phrases in patient records and doctor notes. Benton at al. [5] present a system for de-identifying personal information in medical message board text. Many applications are believed to benefit from identifying the correct word sense in entity recognition tasks. MetaMap [6], for example, is a system that provides UMLS [7] concept and semantic type annotation to free text and can significantly benefit from word-sense disambiguation. Jimeno-Yepes et al. [8] work on a knowledge-based word sense disambiguation approach that uses collocation analysis to improve the knowledge-based word sense disambiguation system. Automatic extraction of bibliographic data, such as article titles, author names, abstracts, and references are essential to citation databases, such as MEDLINE. Zhang et al. [9] examine the task of identifying the components of bibliographic references. They treat the problem as a sequence labeling problem. Accessibility to gold-standard training data allows scientist to focus on the solution of the problem at hand. In this collection we include two papers that are dedicated to this issue. Wilbur and Kim [10] treat human relevance judgments of MEDLINE document pairs to improve on gold standard annotations, whereas Yeganova et al. [3] present a method that relies on naturally occurring positive training examples and synthetically generated negative training examples to train their model. Finally, Islamaj Dogan et al. [4] investigate a clinical relationship extraction problem. They approach it as a classification task, training classifiers to assign a relationship type to a pair of clinical concepts after performing entity recognition.

Knowledge-based Word Research Articles

Articles published on Knowledge-based Word

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천

Exploiting domain information for Word Sense Disambiguation of medical documents

Customized News Filtering and Summarization System Based on Personal Interest

Knowledge-based Word Sense Disambiguation with Feature Words Based on Dependency Relation and Syntax Tree

An integrated semantic-based approach in concept based video retrieval

Collocation analysis for UMLS knowledge-based word sense disambiguation.

Topics in machine learning for biomedical literature analysis and text retrieval

Knowledge-based biomedical word sense disambiguation: comparison of approaches

An Evaluation of a Knowledge Base of Words and Thesauruses on Measuring the Semantic Similarity between Words

A mechanism for inferring approximate solutions under incomplete knowledge based on rule similarity

Character recognition without segmentation

Memory Subtypes in Learning Disabled Readers

Developmental Changes in LD Readers' Encoding Preferences

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Knowledge-based Word Research Articles

Articles published on Knowledge-based Word

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천

Exploiting domain information for Word Sense Disambiguation of medical documents

Customized News Filtering and Summarization System Based on Personal Interest

Knowledge-based Word Sense Disambiguation with Feature Words Based on Dependency Relation and Syntax Tree

An integrated semantic-based approach in concept based video retrieval

Collocation analysis for UMLS knowledge-based word sense disambiguation.

Topics in machine learning for biomedical literature analysis and text retrieval

Knowledge-based biomedical word sense disambiguation: comparison of approaches

An Evaluation of a Knowledge Base of Words and Thesauruses on Measuring the Semantic Similarity between Words

A mechanism for inferring approximate solutions under incomplete knowledge based on rule similarity

Character recognition without segmentation

Memory Subtypes in Learning Disabled Readers

Developmental Changes in LD Readers' Encoding Preferences