Abstract

Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. In this paper, we compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. Our results imply that the combination of Amino acid occurrence Frequency with Composition of K-Spaced Amino Acid Pairs (AF-CKSAAP) provides the best description of relevant sequence features that surround a phosphorylation site. A support vector machine (SVM) using AF-CKSAAP achieves the best performance in classifying rice protein phophorylation sites when compared to the other algorithms. We have used SVM with AF-CKSAAP to construct a rice-specific protein phosphorylation sites predictor, Rice_Phospho 1.0 (http://bioinformatics.fafu.edu.cn/rice_phospho1.0). We measure the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) of Rice_Phospho 1.0 to be 82.0% and 0.64, significantly higher than those measures for other predictors such as Scansite, Musite, PlantPhos and PhosphoRice. Rice_Phospho 1.0 also successfully predicted the experimentally identified phosphorylation sites in LOC_Os03g51600.1, a protein sequence which did not appear in the training dataset. In summary, Rice_phospho 1.0 outputs reliable predictions of protein phosphorylation sites in rice, and will serve as a useful tool to the community.

Highlights

  • Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common

  • The performance of the three sole encoding schemes was measured by using different sizes of datasets and with Support Vector Machine (SVM) used as the classifier

  • We found that the combination of all the three encoding schemes did not significantly outperform Composition of K-Spaced Amino Acid Pairs (CKSAAP) (Data not shown) but increased feature dimensions

Read more

Summary

Introduction

Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. We compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. A series of algorithms have been developed to predict phosphorylation sites from amino acid sequence. These range from simple motif or pattern searches to more complex machine learning methods like Artificial Neural Networks (ANN) and Support Vector Machines (SVM). PhosPhAt predicting phosphorylated-Serine sites for Arabidopsis is found to perform better with Arabidopsis sequences than other generic predictors[13]. A protein family specific phosphorylation site predictor, PhosTryp, was developed for the trypanosomatidae family in parasitic protozoa[14]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.