Self-training in significance space of support vectors for imbalanced biomedical event data.

Tsendsuren Munkhdalai,Keun Ho Ryu,Oyun-Erdene Namsrai

doi:10.1186/1471-2105-16-s7-s6

Tsendsuren Munkhdalai, Keun Ho Ryu + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-16-s7-s6

Copy DOI

Abstract

BackgroundPairwise relationships extracted from biomedical literature are insufficient in formulating biomolecular interactions. Extraction of complex relations (namely, biomedical events) has become the main focus of the text-mining community. However, there are two critical issues that are seldom dealt with by existing systems. First, an annotated corpus for training a prediction model is highly imbalanced. Second, supervised models trained on only a single annotated corpus can limit system performance. Fortunately, there is a large pool of unlabeled data containing much of the domain background that one can exploit.ResultsIn this study, we develop a new semi-supervised learning method to address the issues outlined above. The proposed algorithm efficiently exploits the unlabeled data to leverage system performance. We furthermore extend our algorithm to a two-phase learning framework. The first phase balances the training data for initial model induction. The second phase incorporates domain knowledge into the event extraction model. The effectiveness of our method is evaluated on the Genia event extraction corpus and a PubMed document pool. Our method can identify a small subset of the majority class, which is sufficient for building a well-generalized prediction model. It outperforms the traditional self-training algorithm in terms of f-measure. Our model, based on the training data and the unlabeled data pool, achieves comparable performance to the state-of-the-art systems that are trained on a larger annotated set consisting of training and evaluation data.

Highlights

Pairwise relationships extracted from biomedical literature are insufficient in formulating biomolecular interactions
We compared the method against the approaches used to solve the data imbalance problem
We investigated the event extraction system performance, relying on our proposed method to report the different values of the evaluation measures along with the GE’11 shared task entries

Summary

Introduction

Pairwise relationships extracted from biomedical literature are insufficient in formulating biomolecular interactions. Extraction of complex relations (namely, biomedical events) has become the main focus of the textmining community. An annotated corpus for training a prediction model is highly imbalanced. Supervised models trained on only a single annotated corpus can limit system performance. The named entities recognized, and pairwise relationships extracted, are insufficient for understanding biomolecular interactions [9]. Extraction of complex relations (namely, biomedical events) has received increasing attention. The rule-based systems tend to achieve high precision with low recall and to perform better on prediction of simple events. Since most computation is for matching pre-generated rules against text, such systems show good performance in terms of computation efficiency

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 23, 2015
Citations: 51	License type: cc-by

R Discovery Prime

R Discovery Prime

Self-training in significance space of support vectors for imbalanced biomedical event data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Global Locality in Biomedical Relation and Event Extraction
Elaheh Shafieibavani ... David Martinez Iraola
-
Elaheh Shafieibavani, et. al.Elaheh Shafieibavani ... David Martinez Iraola
01 Jan 2020
01 Jan 2020

SILU: Strategy Involving Large-scale Unlabeled Logs for Improving Malware Detector
Taishi Nishiyama ... Kazunori Kamiya
-
Taishi Nishiyama, et. al.Taishi Nishiyama ... Kazunori Kamiya
01 Jul 2020
01 Jul 2020

Actively constructing an effective training set by expected gain maximization criterion
Weining Wu ... Maozu Guo
Neurocomputing | VOL. 158
Weining Wu, et. al.Weining Wu ... Maozu Guo
10 Feb 2015
Neurocomputing | VOL. 158

Combining labeled and unlabeled data for biomédical event extraction
Jian Wang ... Hongfei Lin
-
Jian Wang, et. al.Jian Wang ... Hongfei Lin
01 Oct 2012
01 Oct 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-training in significance space of support vectors for imbalanced biomedical event data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics