Active learning for human protein-protein interaction prediction

Thahir P Mohamed,Madhavi K Ganapathiraju,Jaime G Carbonell

doi:10.1186/1471-2105-11-s1-s57

Thahir P Mohamed, Madhavi K Ganapathiraju + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-11-s1-s57

Copy DOI

Abstract

BackgroundBiological processes in cells are carried out by means of protein-protein interactions. Determining whether a pair of proteins interacts by wet-lab experiments is resource-intensive; only about 38,000 interactions, out of a few hundred thousand expected interactions, are known today. Active machine learning can guide the selection of pairs of proteins for future experimental characterization in order to accelerate accurate prediction of the human protein interactome.ResultsRandom forest (RF) has previously been shown to be effective for predicting protein-protein interactions. Here, four different active learning algorithms have been devised for selection of protein pairs to be used to train the RF. With labels of as few as 500 protein-pairs selected using any of the four active learning methods described here, the classifier achieved a higher F-score (harmonic mean of Precision and Recall) than with 3000 randomly chosen protein-pairs. F-score of predicted interactions is shown to increase by about 15% with active learning in comparison to that with random selection of data.ConclusionActive learning algorithms enable learning more accurate classifiers with much lesser labelled data and prove to be useful in applications where manual annotation of data is formidable. Active learning techniques demonstrated here can also be applied to other proteomics applications such as protein structure prediction and classification.

Highlights

Biological processes in cells are carried out by means of protein-protein interactions
The vectors have 27 dimensions and contain features corresponding to Gene Ontology (GO) cell component (1), GO molecular function (1), GO biological process (1), co-occurrence in tissue (1), gene expression (16), sequence similarity (1), homology based (5) and domain interaction (1), where the numbers in brackets correspond to the number of elements contributed by the feature type to the feature vector
The GO features measure similarity of two genes based on the similarity between the terms they share in the Gene Ontology database

Summary

Introduction

Biological processes in cells are carried out by means of protein-protein interactions. Several high throughput methods such as Yeast 2-Hybrid (Y2H) and mass spectrometry methods help determine protein interactions. These methods suffer from high false positive rates, and many protein interaction predictions supported by one method are not supported by another. In complex organisms like human, applying high throughput methods to test every possible protein pair (which is in the order of 108) would be very expensive in terms of cost and effort. Computational methods are necessary to complete the interactome expeditiously

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2010
Citations: 83	License type: cc-by

R Discovery Prime

R Discovery Prime

Active learning for human protein-protein interaction prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Active learning of interface programs

-

26 Jun 2012
26 Jun 2012

Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification
Göksu Tüysüzoğlu ... Yusuf Yaslan
Expert Systems With Applications | VOL. 91
Göksu Tüysüzoğlu, et. al.Göksu Tüysüzoğlu ... Yusuf Yaslan
11 Sep 2017
Expert Systems With Applications | VOL. 91

Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy
Maria Florina Balcan ... Vitaly Feldman
Algorithmica | VOL. 72
Maria Florina Balcan, et. al.Maria Florina Balcan ... Vitaly Feldman
11 Nov 2014
Algorithmica | VOL. 72

Teachers’ self-motivation and sense of responsibility determine the use of active learning methods
Katja Enberg ... Ida Helene Steen
Nordic Journal of STEM Education | VOL. 7
Katja Enberg, et. al.Katja Enberg ... Ida Helene Steen
30 Jan 2023
Nordic Journal of STEM Education | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active learning for human protein-protein interaction prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics