Large-scale directional relationship extraction and resolution

Cory B Giles,Jonathan D Wren

doi:10.1186/1471-2105-9-s9-s11

Cory B Giles, Jonathan D Wren

Open Access

https://doi.org/10.1186/1471-2105-9-s9-s11

Copy DOI

Journal: BMC bioinformatics	Publication Date: Aug 12, 2008
Citations: 68	License type: CC BY 2.0

Affiliation: Oklahoma Medical Research Foundation

Abstract

BackgroundRelationships between entities such as genes, chemicals, metabolites, phenotypes and diseases in MEDLINE are often directional. That is, one may affect the other in a positive or negative manner. Detection of causality and direction is key in piecing pathways together and in examining possible implications of experimental results. Because of the size and growth of biomedical literature, it is increasingly important to be able to automate this process as much as possible.ResultsHere we present a method of relation extraction using dependency graph parsing with SVM classification. We tested the SVM classifier first on gold standard corpora from GENIA and find it achieved 82% precision and 94.8% recall (F-measure of 87.9) on these standardized test sets. We then applied the entire system to all available MEDLINE abstracts for two target interactions with known effects. We find that while some directional relations are extracted with low ambiguity, others are apparently contradictory, at least when considered in an isolated context. When examined, it is apparent some are dependent upon the surrounding context (e.g. whether the relationship referred to short-term or long-term effects, or whether the focus was extracellular versus intracellular).ConclusionThesaurus-based directional relation extraction can be done with reasonable accuracy, but is prone to false-positives on larger corpora due to noun modifiers. Furthermore, methods of resolving or disambiguating relationship context and contingencies are important for large-scale corpora.

Highlights

Large-scale or systems-wide analysis of relationship networks is built from individual links
We present a directional relation extraction (DRE) system that uses a support vector machine (SVM) to classify dependency paths, as SVMs have been shown recently to be suited to this type of task [26]
We present the results of cross-validation on the GENIA event corpus and explore the results of large-scale extraction from MEDLINE abstracts

Summary

Introduction

Large-scale or systems-wide analysis of relationship networks (e.g. protein-protein) is built from individual links. BMC Bioinformatics 2008, 9(Suppl 9):S11 http://www.biomedcentral.com/1471-2105/9/S9/S11 chance of being related in a non-trivial manner, while entities co-occurring within an abstract have approximately a 50% chance [5,6] (exact numbers vary). These co-occurrence based approaches, despite their computational efficiency, necessarily remain agnostic about the nature of the relationship between entities. If we are to have any hope at understanding how control and cause/effect are propagated in these networks, we must establish directionality Relationships between entities such as genes, chemicals, metabolites, phenotypes and diseases in MEDLINE are often directional. Because of the size and growth of biomedical literature, it is increasingly important to be able to automate this process as much as possible

Methods

Results

Conclusion