Abstract

One of the central problems in computational biology is to identify the protein function in an automated and high-throughput fashion. A key step in this process is to predict subcellular compartment the protein belongs to, since the protein localization closely correlates with its function. A wide variety of methods for protein subcellular localization has been proposed over recent years. They fall into two categories, sequence-based and database-based. The first one is to extract useful features from amino acid sequences and strives to discover the principles behind protein localization process. The second one is more apt to conduct data mining from existing public annotation databases. This paper focuses on the sequence-based approach and exploits the discriminative ability contained in amino acid sequences for protein subcellular localization. By using support vector machines (SVMs) as predictors, we conducted comparisons among amino acid composition approach, amino acid tuple approach, voting scheme, and a new characteristic representation of proteins proposed in this paper. Our experiments are carried out on 7579 eukaryotic protein sequences from 12 subcellular locations. The highest accuracy, 82.8% across 5-fold cross validation, is obtained by voting scheme using five predictors. This is the best performance achieved on this dataset using sequence-based approach. Our experiments demonstrate that there are considerable potentials on improving prediction accuracy by exploiting protein sequences, which have not been fully utilized so far, and more explorations are still needed in this direction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.