Abstract

High-throughput analysis of the transcriptome and proteome individually are used to interrogate complex oncogenic processes in cancer. However, an outstanding challenge is how to combine these complementary, yet partially disparate data sources to accurately identify tumor-specific gene products and clinical biomarkers. Here, we introduce inteGREAT for robust and scalable differential integration of high-throughput measurements. With inteGREAT, each data source is represented as a co-expression network, which is analyzed to characterize the local and global structure of each node across networks. inteGREAT scores the degree by which the topology of each gene in both transcriptome and proteome networks are conserved within a tumor type, yet different from other normal or malignant cells. We demonstrated the high performance of inteGREAT based on several analyses: deconvolving synthetic networks, rediscovering known diagnostic biomarkers, establishing relationships between tumor lineages, and elucidating putative prognostic biomarkers which we experimentally validated. Furthermore, we introduce the application of a clumpiness measure to quantitatively describe tumor lineage similarity. Together, inteGREAT not only infers functional and clinical insights from the integration of transcriptomic and proteomic data sources in cancer, but also can be readily applied to other heterogeneous high-throughput data sources. inteGREAT is open source and available to download from https://github.com/faryabib/inteGREAT.

Highlights

  • Cellular processes are tightly regulated in multiple layers, leading to coordinated function of genes and gene products including transcripts and proteins

  • The salient assumption underlying such comparative studies is that there is a one-to-one relationship between transcript and protein expression, previous studies have shown low correlation between these levels (Haider and Pal, 2013; Zhang et al, 2014). Another implicit assumption is that genomescale technologies such as generation sequencing-based transcriptomics and mass spectrometry-based proteomics have comparable sensitivity to capture the activities of these gene products

  • We propose that expanding differential expression analysis from the individual level to differential integration can facilitate biomarker discovery

Read more

Summary

INTRODUCTION

Cellular processes are tightly regulated in multiple layers, leading to coordinated function of genes and gene products including transcripts and proteins. It has been shown that combined analysis of data characterizing a variety of biomolecules yields discovery of new insights into tumor biology and facilitates identification of important cancer genes and therapeutic targets (Chang et al, 2013; Zhang et al, 2014; Mertins et al, 2016; Zhang et al, 2016) These initiatives have increased interest in development of methods for integration of heterogenous data sources (Huang et al, 2013; Meng et al, 2016; Petralia et al, 2016). It is critical to effectively combine information gathered by complementary genome-scale measurements to elucidate common and different molecular features of tumor types To address this challenge, methods to integrate heterogeneous data sources such as transcriptomic and proteomic data sets have been proposed (Haider and Pal, 2013). Our differential integration of transcript and protein abundance across four tumor types is a showcase of using inteGREAT for similar integration analysis in other cancers and diseases

MATERIALS AND METHODS
Correlation Network Generation
Vertex Similarity Calculation
Vertex Joining
RESULTS
Pan-Cancer Differential Integration Identifies Putative Prognostic Biomarkers
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.