Phosphorylation is an indispensable regulatory mechanism in cells, with specific sites on kinases that can significantly enhance their activity. Although several such critical phosphorylation sites (phos-sites) have been experimentally identified, many more remain to be explored. To date, no computational method exists to systematically identify these critical phos-sites on kinases. In this study, we introduce PhoSiteformer, a transformer-inspired foundational model designed to generate embeddings of phos-sites using phosphorylation mass spectrometry (phos-MS) data. Recognizing the complementary insights offered by protein sequence data and phos-MS data, we developed a classification model, CSPred, which employs a bimodal fusion strategy. CSPred combines embeddings from PhoSiteformer with those from the protein language model ProtT5. Our approach successfully identified 77 critical phos-sites on 58 human kinases. Two of these sites, T517 on PRKG1 and T735 on PRKD3, have been experimentally verified. This study presents the first systematic and computational approach to identify critical phos-sites that enhance kinase activity.
Read full abstract