Abstract

BackgroundExtracting protein-protein interactions from biomedical literature is an important task in biomedical text mining. Supervised machine learning methods have been used with great success in this task but they tend to suffer from data sparseness because of their restriction to obtain knowledge from limited amount of labelled data. In this work, we study the use of unlabeled biomedical texts to enhance the performance of supervised learning for this task. We use feature coupling generalization (FCG) – a recently proposed semi-supervised learning strategy – to learn an enriched representation of local contexts in sentences from 47 million unlabeled examples and investigate the performance of the new features on AIMED corpus.ResultsThe new features generated by FCG achieve a 60.1 F-score and produce significant improvement over supervised baselines. The experimental analysis shows that FCG can utilize well the sparse features which have little effect in supervised learning. The new features perform better in non-linear classifiers than linear ones. We combine the new features with local lexical features, obtaining an F-score of 63.5 on AIMED corpus, which is comparable with the current state-of-the-art results. We also find that simple Boolean lexical features derived only from local contexts are able to achieve competitive results against most syntactic feature/kernel based methods.ConclusionsFCG creates a lot of opportunities for designing new features, since a lot of sparse features ignored by supervised learning can be utilized well. Interestingly, our results also demonstrate that the state-of-the art performance can be achieved without using any syntactic information in this task.

Highlights

  • Extracting protein-protein interactions from biomedical literature is an important task in biomedical text mining

  • feature coupling generalization (FCG) creates a lot of opportunities for designing new features, since a lot of sparse features ignored by supervised learning can be utilized well

  • We present the application of FCG semi-supervised learning strategy to the PPI extraction task and show that FCD features derived from simple lexical information can achieve good results and produce further improvement over a high baseline

Read more

Summary

Introduction

Extracting protein-protein interactions from biomedical literature is an important task in biomedical text mining. The task of protein-protein interaction extraction (PPIE) aims to extract interacting protein pairs from biomedical literature, which contributes to PPI network analysis and discovery of new functions of proteins In recent years, it has attracted a lot of research interests [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] from different domains such as bioinformatics, natural language processing (NLP), and machine supervised learning there exists a feature space for each kernel [17], these methods essentially represent each interacting protein pair and their contexts by a feature vector and the weights of features are learned from labeled training data.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.