Abstract

Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes. Although nearly 10 kinase-specific prediction programs have been developed, numerous PKs have been casually classified into subgroups without a standard rule. For large scale predictions, the false positive rate has also never been addressed. In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. In addition, we developed a simple approach to estimate the theoretically maximal false positive rates. The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for 408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by GPS 2.0 was exhibited with great performance and remarkable accuracy. Using Aurora-B as an example, we also conducted a proteome-wide search and provided systematic prediction of Aurora-B-specific substrates including protein-protein interaction information. Thus, the GPS 2.0 is a useful tool for predicting protein phosphorylation sites and their cognate kinases and is freely available on line.

Highlights

  • Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes

  • An extensively adopted hypothesis for predicting kinasespecific phosphorylation sites is that PKs in a same group/ subfamily will recognize similar sequence patterns of substrates for modification (9 –19)

  • Numerous PKs were classified into several groups based on sequence comparison by BLAST (9 –19)

Read more

Summary

EXPERIMENTAL PROCEDURES

Protein Kinase Classification for the Training Data Set—The training data set was derived from Phospho.ELM 6.0 [21], including 13,615 experimentally verified phosphorylation sites. In the AGC group, the experimental sites with PK information of PKB_group, PKB␤, PKA␣, PKA_group, and other AGC kinases were used as the training data. In the AGC/AKT family, the verified sites with PK information of PKB_ group and PKB␤ were used. Family, only the verified sites with PK information of PKA␣ and PKA_ group were used. We found that PKG1 had two paralogs in human rather than one gene In this regard, the total human kinome contains 519 unique PKs. As previously described, we used the experimentally verified phosphorylation sites as the positive data (ϩ), whereas all other residues (Ser/Thr or Tyr) in the same substrates were regarded as the negative data (Ϫ) (10 –12, 15–17). The results of n-fold cross-validation were very similar to those with the leave-one-out validation (see supplemental Fig. S1).

Յ i Յ 15
RESULTS
DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.