Abstract

Abstract Mass spectrometry-based phosphoproteomics enables proteome-wide analysis of protein phosphorylation in biological samples. As a prime example, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has performed phosphoproteomic profiling for 1,191 tumors spanning 11 cancer types, generating quantitative data on 77,442 phosphosites. Despite the impressive data generation power, only less than 5% of these phosphosites have been associated with a regulatory kinase or biological function, and 90% of these annotated phosphosites were associated with 20% well-studied kinases. The “dark” phosphoproteome greatly limits our ability to gain functional insights into cancer signaling. Previous research has shown that co-regulation is a strong indicator of functional association. Here we leveraged machine learning and the vast amount of CPTAC data to build a co-regulation map of phosphosites to facilitate functional interpretation of phosphoproteomics findings. Based on a ground-truth dataset including 98,402 pairs of phosphosites known to be regulated by the same kinase (positives) and 1,317,273 pairs by kinases from distant kinase families (negatives), we developed an Extreme Gradient Boosting (XGBoost) classifier to distinguish the positive and negative pairs using CPTAC phosphoproteomic data, protein-protein interaction data, and phosphopeptide sequences. Applying the trained classifier to 3 billion phosphosite pairs identified 2,569,519 (0.08%) with high probability of co-regulation, i.e., 400 times more likely to connect positive pairs than negative pairs in an independent ground-truth dataset. These pairs constituted a co-regulation map of 30,499 unique phosphosites, called CoPheeMap.To demonstrate the utility of CoPheeMap, we integrated its network embedding features, embedding features from a kinase network, together with Position-Specific Scoring Matrices scores and site-kinase abundance associations to develop a XGBoost model to predict kinase substrate associations (KSAs). The resulted model CoPheeKSA showed superior performance with an AUROC of 0.97 and identified 12,000 high-quality novel KSAs involving 7,908 phosphosites. In another application, CoPheeMap based information propagation (CoPheeProp) assigned 5,000 phosphosites to different signaling pathways with high specificity, increasing existing knowledge by 9-fold. Applying CoPheeMap and the derived tools to CPTAC and other cancer phosphoproteomics datasets revealed regulatory and functional information for previously unannotated sites showing strong associations with somatic mutations and cancer phenotypes, leading to actionable mechanistic and therapeutic insights. Together, CoPheeMap, CoPheeKSA and CoPheeProp provide a systematic framework to illuminate the dark phosphoproteome and have broad applications ranging from basic cancer biology research to clinical investigations. Citation Format: Wen Jiang, Eric Jaehnig, Bing Zhang. CoPheeMap: a co-regulation map of 30,000 phosphosite illuminates the dark cancer phosphoproteome [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2023.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call