Abstract

BackgroundBiomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness.MethodsIn this study, a semi-supervised method for combining labeled data with large scale of unlabeled data is presented to improve the performance of biomedical event extraction. We propose a set of rich feature vector, including a variety of syntactic features and semantic features, such as N-gram features, walk subsequence features, predicate argument structure (PAS) features, especially some new features derived from a strategy named Event Feature Coupling Generalization (EFCG). The EFCG algorithm can create useful event recognition features by making use of the correlation between two sorts of original features explored from the labeled data, while the correlation is computed with the help of massive amounts of unlabeled data. This introduced EFCG approach aims to solve the data sparse problem caused by limited tagging corpus, and enables the new features to cover much more event related information with better generalization properties.ResultsThe effectiveness of our event extraction system is evaluated on the datasets from the BioNLP Shared Task 2011 and PubMed. Experimental results demonstrate the state-of-the-art performance in the fine-grained biomedical information extraction task.ConclusionsLimited labeled data could be combined with unlabeled data to tackle the data sparseness problem by means of our EFCG approach, and the classified capability of the model was enhanced through establishing a rich feature set by both labeled and unlabeled datasets. So this semi-supervised learning approach could go far towards improving the performance of the event extraction system. To the best of our knowledge, it was the first attempt at combining labeled and unlabeled data for tasks related biomedical event extraction.

Highlights

  • Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method

  • Supervised methods for biomedical event extraction are often affected by data sparseness

  • Through the Event Feature Coupling Generalization (EFCG) strategy, the classified capability of the model was enhanced to improve the performance of biomedical event extraction

Read more

Summary

Introduction

Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness. Some models of biomedical event extraction have aroused substantial interest in bioinformatic domain. The expressive event representation captures extracted knowledge as structured, recursively nested, typed associations of arbitrarily many participants in specific roles [1]. Event extraction refers to tasks the purpose of which is extracting information beyond the entity level. The proposed approaches to extract events can be divided into 2 main groups: namely rule-based and machine learning (ML)-based extraction methods. ID E1 Type Gene_expression Trigger Expression Theme RANTES Cause

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.