Abstract Post-translational modifications (PTMs) play key roles in regulating cell signaling in health and disease processes. Recent advances in mass spectrometry enable comprehensive investigation of PTMs. However, identifying functionally relevant PTMs and protein domains implicated in disease remains a key challenge. We present CLUMPS-PTM, an algorithm for spatial clustering of PTMs to propose functional domains. We apply CLUMPS-PTM to the Clinical Proteomic Tumor Analysis Consortium (CPTAC) dataset (one of the largest proteogenomic pan-cancer PTM datasets) comprising 1110 samples spanning 11 tumor types. First, we map over 14K phospho-sites and 13K acetyl-sites detected in the CPTAC dataset to protein data bank (PDB) files. We expand this to the predicted AlphaFold proteome and recover additional 12K phospho-sites and 10K acetyl-sites exceeding model confidence (pLDDT) of 70%. Consistent with previous research, phospho-sites are found on residues with lower AlphaFold prediction confidence than acetyl-sites due to their abundance in unstructured domains (p-value < 1e-4). To identify functionally relevant protein domains in the pan-cancer context, we characterize the 1110 samples by applying SignatureAnalyzer, a Bayesian non-negative matrix factorization method, to discover multi-omic patterns. This analysis identifies strong signatures associated with DNA damage response (DDR) and stratifies the CPTAC cohort into DDR-high (n=442) and DDR-low (n=668) groups. Next, we applied CLUMPS-PTM to differentially modified phospho/acetyl-sites between these groups. We identified 22 proteins (including GMPS, NCBP1, and ANPEP (CD13)) with significant clusters of differentially phosphorylated sites (q-value < 0.1) in DDR-high samples. ANPEP is a zinc-dependent aminopeptidase with a significant cluster (q-value = 0.0129) of differentially phosphorylated sites in the ERAP1-C domain that faces peptidase active sites known to have proliferative activity via the MAPK pathway. CLUMPS-PTM also identified a significant cluster of acetyl-sites on ARID1A (a SWI/SNF chromatin remodeler known to be mutated in cancer) in DDR-high samples (q-value = 0.085). The ARID1A cluster is on the C-terminal tail of the protein within the glucocorticoid receptor (GR) domain, an upstream regulator of proteins that increase catabolism, reduce inflammation, and increase cell survival. Increased acetylation may block a ubiquitination site at the C-terminus of the protein, resulting in increased protein levels, likely due to reduced ARID1A degradation, and increased GR signaling. Overall, CLUMPS-PTM is an open-source tool that allows near proteome-wide spatial analysis with the growing availability of PTM data. We anticipate that it will be useful to the broader proteomic community for the discovery of novel targets and generation of insights into functional mechanisms. Citation Format: Shankara K. Anand, Yifat Geffen, Yo Akiyama, Francois Aguet, Gad Getz. CLUMPS-PTM: Spatial clustering of post-translational modifications across cancer types [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 3132.
Read full abstract