Abstract

Protein methylation is one type of reversible post-translational modifications (PTMs), which plays vital roles in many cellular processes such as transcription activity, DNA repair. Experimental identification of methylation sites on proteins without prior knowledge is costly and time-consuming. In silico prediction of methylation sites might not only provide researches with information on the candidate sites for further determination, but also facilitate to perform downstream characterizations and site-specific investigations. In the present study, a novel approach based on Bi-profile Bayes feature extraction combined with support vector machines (SVMs) was employed to develop the model for Prediction of Protein Methylation Sites (BPB-PPMS) from primary sequence. Methylation can occur at many residues including arginine, lysine, histidine, glutamine, and proline. For the present, BPB-PPMS is only designed to predict the methylation status for lysine and arginine residues on polypeptides due to the absence of enough experimentally verified data to build and train prediction models for other residues. The performance of BPB-PPMS is measured with a sensitivity of 74.71%, a specificity of 94.32% and an accuracy of 87.98% for arginine as well as a sensitivity of 70.05%, a specificity of 77.08% and an accuracy of 75.51% for lysine in 5-fold cross validation experiments. Results obtained from cross-validation experiments and test on independent data sets suggest that BPB-PPMS presented here might facilitate the identification and annotation of protein methylation. Besides, BPB-PPMS can be extended to build predictors for other types of PTM sites with ease. For public access, BPB-PPMS is available at http://www.bioinfo.bio.cuhk.edu.hk/bpbppms.

Highlights

  • Many proteins experience post-translational modifications through which they present structural as well as functional diversity and play important roles in many biological processes

  • BPB-PPMS achieves the performance with a sensitivity of 74.71%, a specificity of 94.32% and an accuracy of 87.98% for arginine as well as a sensitivity of 70.05%, a specificity of 77.08% and an accuracy of 75.51% in the case of 5-fold cross-validation

  • There is an attempt to further assess the performance of BPB-PPMS through test on the independent datasets, which were obtained by randomly choosing proteins with experimentally verified arginine and lysine methylation as well as non-homolog to those proteins used for training BPB-PPMS in PubMed literatures

Read more

Summary

Introduction

Many proteins experience post-translational modifications through which they present structural as well as functional diversity and play important roles in many biological processes. Et al [7] built a predictor for arginine and lysine methylation using SVMs based on the hypothesis that PTMs preferentially occurs intrinsically disordered regions They collected positive training datasets (methylated sites) from SWISS-PROT database (release 45)[8] and negative training datasets (nonmethylated sites) from the same proteins, which include all arginines and lysines not marked as methylated. Examples in training datasets were encoded by a set of features including amino acid frequencies, aromatic content, flexibility scalar, net charge, hydrophobic moment, beta entropy, disorder information as well as PSI-BLAST profiles In another team, Chen et al [9] constructed the first online server MeMo for arginine and lysine methylation prediction via SVMs strategy. Examples in training datasets were represented by orthogonal binary coding scheme

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.