An SVM-based system for predicting protein subnuclear localizations

Zhengdeng Lei,Yang Dai

doi:10.1186/1471-2105-6-291

Abstract

BackgroundThe large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key.ResultsNew kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at .ConclusionThe integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available.

Highlights

The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences
The confinement of biomolecules within specific compartments is crucial for the formation and function of the cell nucleus; in contrast, the mis-localization of proteins can lead to both human genetic disease and cancer [3]
This study presents the performance of conventional kpeptide encoding methods and the new proposed kernels for the prediction of protein subnuclear compartments

Summary

Introduction

The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. Protein complexes disperse throughout the entire organelle, it is known that many nuclear proteins participating in related pathways tend to concentrate into specific areas [1,2]. The confinement of biomolecules within specific compartments is crucial for the formation and function of the cell nucleus; in contrast, the mis-localization of proteins can lead to both human genetic disease and cancer [3]. Information on protein subnuclear localization is essential for a full understanding of genomic regulation and function. A computational prediction of protein subnuclear compartments from primary protein sequences can provide important clues to the function of novel proteins

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2005
Citations: 121	License type: cc-by

R Discovery Prime

R Discovery Prime

An SVM-based system for predicting protein subnuclear localizations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features
Wen-Lin Huang ... Shinn-Ying Ho
BioSystems | VOL. 90
Wen-Lin Huang, et. al.Wen-Lin Huang ... Shinn-Ying Ho
04 Jan 2007
BioSystems | VOL. 90

Protein Subnuclear Localization Using a Hybrid Classifier Combined with Chou's Pseudo Amino Acid Composition
Chaohong Song
-
Chaohong SongChaohong Song
01 Oct 2018
01 Oct 2018

ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization
Wen-Lin Huang ... Shih-Wen Ho
BMC Bioinformatics | VOL. 9
Wen-Lin Huang, et. al.Wen-Lin Huang ... Shih-Wen Ho
01 Feb 2008
BMC Bioinformatics | VOL. 9

Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction
Zhengdeng Lei ... Yang Dai
BMC Bioinformatics | VOL. 7
Zhengdeng Lei, et. al.Zhengdeng Lei ... Yang Dai
07 Nov 2006
BMC Bioinformatics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An SVM-based system for predicting protein subnuclear localizations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics