Abstract

Gene set analysis (GSA) has become the common methodology for analyzing transcriptomics data. However, self-contained GSA techniques are rarely, if ever, used for proteomics data analysis. Here we present a self-contained proteome level GSA of four consensus molecular subtypes (CMSs) previously established by transcriptome dissection of colon carcinoma specimens. Despite notable difference in structure of proteomics and transcriptomics data, many pathway-wide characteristic features of CMSs found at the mRNA level were reproduced at the protein level. In particular, CMS1 features show heavy involvement of immune system as well as the pathways related to mismatch repair, DNA replication and functioning of proteasome, while CMS4 tumors upregulate complement pathway and proteins participating in epithelial-to-mesenchymal transition (EMT). In addition, protein level GSA yielded a set of novel observations visible at the proteome, but not at the transcriptome level, including possible involvement of major histocompatibility complex II (MHC-II) antigens in the known immunogenicity of CMS1 and a connection between cholesterol trafficking and the regulation of Integrin-linked kinase (ILK) in CMS3. Overall, this study proves utility of self-contained GSA approaches as a critical tool for analyzing proteomics data in general and dissecting protein-level molecular portraits of human tumors in particular.

Highlights

  • The CRC Subtyping Consortium (CRCSC) (1) re-classified merged datasets compiled from the data produced by all groups providing the original algorithms, (2) calculated a similarity matrix based on Jaccard coefficients between all subtypes (3) retained only subtypes with statistically significant associations, (4) formed a network of subtypes and (5) used Markov Cluster algorithm to split the network into four molecular subgroups named “Consensus Molecular Subtypes” (CMS) [40,41]

  • When the samples for all four CMSs (N = 86) were analyzed by PCA based on their proteome features, the separation of subtypes was rather poor, with only CMS1 visibly separated from the rest (Fig 1)

  • We set to find out if there are any protein-level Kyoto Encyclopedia of genes and genomes (KEGG) pathways (167 in Molecular Signature Database (MSigDB) C2 collection) that were differentially expressed between CMSs

Read more

Summary

Introduction

Proteome-transcriptome alignment of colon cancer subtypes achieved by self-contained gene set analysis compared to competitive tests [1,8,22] To bridge this gap, here we demonstrate utility of selfcontained GSA approaches in analyzing consensus colon cancer subtypes on proteomics data. The CRCSC (1) re-classified merged datasets compiled from the data produced by all groups providing the original algorithms, (2) calculated a similarity matrix based on Jaccard coefficients between all subtypes (3) retained only subtypes with statistically significant associations, (4) formed a network of subtypes and (5) used Markov Cluster algorithm to split the network into four molecular subgroups named “Consensus Molecular Subtypes” (CMS) [40,41] These include CMS1, defined by high mutation rate, encompassing most microsatellite instable (MSI) tumors with inactivating alternations in mismatch repair (MMR) genes. We re-analyze previously published proteomes of CRC to elucidate to what extent transcriptionally identified CMS subtypes are detectable at the proteome level with self-contained GSA tests and if new pathways can be detected with self-contained tests

Methods
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call