Abstract

Assessment of the transcriptome of tumors via high‐throughout methods such as RNA‐sequencing (RNA‐seq) has yielded insights that enable understanding of tumor biology and facilitate drug discovery efforts. A challenge with using such “omics” methods is the presence of technical variation between groups of samples (often termed ‘batch effects’) which arise from differences in sample collection, handling and processing, especially with large, multi‐center, consortium‐based efforts such as The Cancer Genome Atlas (TCGA). Approaches to correct for technical artifacts should seek to minimize variance introduced by known technical factors, while preserving the biological variability within the dataset. We present a workflow for assessing the magnitude of identifiable technical artifacts within RNA‐seq data in TCGA and to identify the most relevant sources of variation are most relevant and steps to correct for such artifacts using bioinformatics tools. We show that certain TCGA datasets are highly impacted by technical variation and that correction for these artifacts can alter downstream analyses such as association studies and survival analysis. Correction for these effects using batch correction tools removes the influence of technical artifacts, while preserving biological variation and the overall structure and integrity of the data. We use these corrected data to establish an updated list of significant genes that impact survival in a range of solid tumor types in TCGA . In addition, we use methods to identify networks of co‐expressed and co‐regulated genes and identify genes that are sensitive to molecular features of tumors, such as the presence of specific driver mutations.Using these “tidied‐up” datasets, we analyzed the expression of GPCRs (G protein‐coupled receptors) in tumors to identify the cellular compartments within the tumor microenvironment that express particular GPCRs. Due to their widespread utility as pharmacological targets and numerous recent results re‐emphasizing a role for GPCRs in cancer, such analyses provide a valuable tool for drug discovery. We document a number of GPCRs that are widely expressed in immune cells in a range of solid tumors and that whose functions may impact the phenotype of immune cells, including in cancer. Such widely expressed immune‐associated GPCRs are examples of novel putative targets in the rapidly growing field of immuno‐oncology. These analyses for GPCRs can be expanded to other classes of targets as well, providing a general tool for using large databases of gene expression to identify and evaluate genes that are likely to be expressed in cell types within tumors in vivo, thereby avoiding pitfalls that arise when working with isolated cells or single‐cell based approaches.Support or Funding InformationASPET David Lehr Award

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call