Semi-supervised multi-label collective classification ensemble for functional genomics.

Qingyao Wu,Shen-Shyang Ho,Shuigeng Zhou,Yunming Ye

doi:10.1186/1471-2164-15-s9-s17

Abstract

BackgroundWith the rapid accumulation of proteomic and genomic datasets in terms of genome-scale features and interaction networks through high-throughput experimental techniques, the process of manual predicting functional properties of the proteins has become increasingly cumbersome, and computational methods to automate this annotation task are urgently needed. Most of the approaches in predicting functional properties of proteins require to either identify a reliable set of labeled proteins with similar attribute features to unannotated proteins, or to learn from a fully-labeled protein interaction network with a large amount of labeled data. However, acquiring such labels can be very difficult in practice, especially for multi-label protein function prediction problems. Learning with only a few labeled data can lead to poor performance as limited supervision knowledge can be obtained from similar proteins or from connections between them. To effectively annotate proteins even in the paucity of labeled data, it is important to take advantage of all data sources that are available in this problem setting, including interaction networks, attribute feature information, correlations of functional labels, and unlabeled data.ResultsIn this paper, we show that the underlying nature of predicting functional properties of proteins using various data sources of relational data is a typical collective classification (CC) problem in machine learning. The protein functional prediction task with limited annotation is then cast into a semi-supervised multi-label collective classification (SMCC) framework. As such, we propose a novel generative model based SMCC algorithm, called GM-SMCC, to effectively compute the label probability distributions of unannotated protein instances and predict their functional properties. To further boost the predicting performance, we extend the method in an ensemble manner, called EGM-SMCC, by utilizing multiple heterogeneous networks with various latent linkages constructed to explicitly model the relationships among the nodes for effectively propagate the supervision knowledge from labeled to unlabeled nodes.ConclusionExperimental results on a yeast gene dataset predicting the functions and localization of proteins demonstrate the effectiveness of the proposed method. In the comparison, we find that the performances of the proposed algorithms are better than the other compared algorithms.

Highlights

With the rapid accumulation of proteomic and genomic datasets in terms of genome-scale features and interaction networks through high-throughput experimental techniques, the process of manual predicting functional properties of the proteins has become increasingly cumbersome, and computational methods to automate this annotation task are urgently needed
Experimental results on a yeast gene dataset predicting the functions and localization of proteins demonstrate the effectiveness of the proposed method
In the field of functional genomics, the process of manual annotation has become increasingly cumbersome with the rapid accumulation of the proteomic and genomic datasets

Summary

Introduction

With the rapid accumulation of proteomic and genomic datasets in terms of genome-scale features and interaction networks through high-throughput experimental techniques, the process of manual predicting functional properties of the proteins has become increasingly cumbersome, and computational methods to automate this annotation task are urgently needed. Some of the popularly used features are characteristics from amino acid sequence, textual repositories like MEDLINE, and more biologically meaningful features such as motifs derived from motif analysis of protein sequences, the isoelectric point and post-translational modifications Via these constructed attribute features, a predictive model is learnt by training a classifier using annotated proteins, and utilize this model to predict the functions of the proteins [2,3,4,5]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2014
Citations: 49	License type: cc-by

R Discovery Prime

R Discovery Prime

Semi-supervised multi-label collective classification ensemble for functional genomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Semi-supervised multi-label image classification based on nearest neighbor editing
Zhihua Wei ... Rui Zhao
Neurocomputing | VOL. 119
Zhihua Wei, et. al.Zhihua Wei ... Rui Zhao
20 Apr 2013
Neurocomputing | VOL. 119

Robust multi-label semi-supervised classification
Sheng Li ... Yun Fu
-
Sheng Li, et. al.Sheng Li ... Yun Fu
01 Dec 2017
01 Dec 2017

Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization.
Qingyao Wu ... Zhenyu Wang
BMC Systems Biology | VOL. Suppl 9 1
Qingyao Wu, et. al.Qingyao Wu ... Zhenyu Wang
21 Jan 2015
BMC Systems Biology | VOL. Suppl 9 1

Explore the hidden treasure in protein-protein interaction networks - an iterative model for predicting protein functions.
Derui Wang ... Jingyu Hou
Journal of bioinformatics and computational biology | VOL. 13
Derui Wang, et. al.Derui Wang ... Jingyu Hou
01 Oct 2015
Journal of bioinformatics and computational biology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised multi-label collective classification ensemble for functional genomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics