A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins.

Xiao Wang,Guo-Zheng Li

doi:10.1371/journal.pone.0036317

Xiao Wang, Guo-Zheng Li

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0036317

Copy DOI

Export

Save

Cite

Journal: PLoS ONE	Publication Date: May 22, 2012
Citations: 40	License type: CC BY 4.0

Affiliation: Tongji University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.

Highlights

Proteins perform their appropriate functions only when they are located in the correct subcellular locations
Many efforts have been devoted to deal with such a challenge, and a large number of computational methods have been developed in an attempt to predict the subcellular localization of proteins
We focus on predicting the subcellular locations of eukaryotic proteins with both singleplex and multiplex sites

Summary

Introduction

Proteins perform their appropriate functions only when they are located in the correct subcellular locations. Where N(rep) is the number of representative proteins in XP{homo, and 8 >< 1, if the k{th representative protein hits the u{th g(u,k)~>: 0, GO compressnumber otherwise ð5Þ : Note that the GO feature extraction method may become a naught vector or meaningless under any of the following situations: (1) the protein P does not have significant homology to any protein in the Swiss-Prot database, i.e., XP{homo~ meaning the homology set XP{homo is an empty one; (2) its representative proteins do not contain any useful GO information for statistical prediction based on a given training dataset.

Results

Conclusion