Abstract
Protein phosphorylation is a major form of post-translational modification (PTM) that regulates diverse cellular processes. In silico methods for phosphorylation site prediction can provide a useful and complementary strategy for complete phosphoproteome annotation. Here, we present a novel bioinformatics tool, PhosphoPredict, that combines protein sequence and functional features to predict kinase-specific substrates and their associated phosphorylation sites for 12 human kinases and kinase families, including ATM, CDKs, GSK-3, MAPKs, PKA, PKB, PKC, and SRC. To elucidate critical determinants, we identified feature subsets that were most informative and relevant for predicting substrate specificity for each individual kinase family. Extensive benchmarking experiments based on both five-fold cross-validation and independent tests indicated that the performance of PhosphoPredict is competitive with that of several other popular prediction tools, including KinasePhos, PPSP, GPS, and Musite. We found that combining protein functional and sequence features significantly improves phosphorylation site prediction performance across all kinases. Application of PhosphoPredict to the entire human proteome identified 150 to 800 potential phosphorylation substrates for each of the 12 kinases or kinase families. PhosphoPredict significantly extends the bioinformatics portfolio for kinase function analysis and will facilitate high-throughput identification of kinase-specific phosphorylation sites, thereby contributing to both basic and translational research programs.
Highlights
Background setAll human proteins were extracted from the UniProt database[46] and used as the background protein set
We present PhosphoPredict, a new tool developed for computational prediction of human kinase-specific phosphorylation sites
Identifying protein phosphorylation sites is a crucial step in understanding regulatory functions in biological systems
Summary
Background setAll human proteins were extracted from the UniProt database[46] and used as the background protein set. The background set was used to perform statistical analysis and to identify statistically significant functional features (See detail below). The negative samples were randomly selected from the background set. We derived a variety of different features and examined them regarding their impact on model performance. In addition to sequence-derived and functional features, we integrated structural features, including protein secondary structure, solvent accessibility, and native disorder, which have proven useful in previous studies of phosphorylation site prediction. These features are briefly discussed in the following subsections
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.