Abstract

BackgroundProteins perform their functions in associated cellular locations. Therefore, the study of protein function can be facilitated by predictions of protein location. Protein location can be predicted either from the sequence of a protein alone by identification of targeting peptide sequences and motifs, or by homology to proteins of known location. A third approach, which is complementary, exploits the differences in amino acid composition of proteins associated to different cellular locations, and can be useful if motif and homology information are missing. Here we expand this approach taking into account amino acid composition at different levels of amino acid exposure.ResultsOur method has two stages. For stage one, we trained multiple Support Vector Machines (SVMs) to score eukaryotic protein sequences for membership to each of three categories: nuclear, cytoplasmic and extracellular, plus extra category nucleocytoplasmic, accounting for the fact that a large number of proteins shuttles between those two locations. In stage two we use an artificial neural network (ANN) to propose a category from the scores given to the four locations in stage one. The method reaches an accuracy of 68% when using as input 3D-derived values of amino acid exposure. Calibration of the method using predicted values of amino acid exposure allows classifying proteins without 3D-information with an accuracy of 62% and discerning proteins in different locations even if they shared high levels of identity.ConclusionsIn this study we explored the relationship between residue exposure and protein subcellular location. We developed a new algorithm for subcellular location prediction that uses residue exposure signatures. Our algorithm uses a novel approach to address the multiclass classification problem. The algorithm is implemented as web server 'NYCE’ and can be accessed at http://cbdm.mdc-berlin.de/~amer/nyce.

Highlights

  • Proteins perform their functions in associated cellular locations

  • Our method benefits from the fact that there is evolutionary pressure for the selection of mutations that result in protein residues with side chains that have characteristic physicochemical properties according to the exposure of the residue and to the subcellular location of the protein

  • Our study demonstrated that the distribution of amino acids at different levels of exposure have signal about the location of proteins

Read more

Summary

Introduction

Proteins perform their functions in associated cellular locations. A third approach, which is complementary, exploits the differences in amino acid composition of proteins associated to different cellular locations, and can be useful if motif and homology information are missing. We expand this approach taking into account amino acid composition at different levels of amino acid exposure. The cell is a three-dimensional space separated into different compartments. These cellular compartments have different function and physicochemical environment. Subcellular location is a key-feature in the functional characterization of proteins [2].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call