Leveraging global gene expression patterns to predict expression of unmeasured genes.

James Rudd,Casey S Greene,Jennifer A Doherty,Ellen L Goode,Eugene Demidenko,René A Zelaya

doi:10.1186/s12864-015-2250-5

Abstract

BackgroundLarge collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes.ResultsWe developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP|) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1–3), and |rP| (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP|.ConclusionsGGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied to any organism, model system, disease, or tissue type for which whole genome gene expression data exists.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2250-5) contains supplementary material, which is available to authorized users.

Highlights

Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; measurement of genome-wide expression is cost-prohibitive on a large scale
Our greedy geneset selection (GGS) algorithm uses pairwise gene expression correlation (Pearson’s correlation coefficient: pearson correlation coefficient (rP)) to identify sets of correlated genes, and within those sets selects genes to directly measure and genes to attempt to infer using the directly measured genes. We applied this algorithm to the Cancer Genome Atlas (TCGA) high grade serous ovarian cancer (HGSC) data (Affymetrix HGU133a; Fig. 1a), and compared the ability of greedy gene set selection (GGS) to maximize the number of inferred genes given a user defined size of directly measured (DM) genes to that of a ranked-degree gene selection
To determine the extent to which the eligible genes represent a wide variety of biological processes, we performed enrichment analysis on the Protein Analysis Through Evolutionary Relationships (PANTHER) GO-slim biological process terms (223 terms) using the 3,695 eligible genes identified using the 0.60 threshold with background frequencies determined by the 8,265 truly expressed genes

Summary

Introduction

Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. Just as it is important to select tag SNPs based on allele correlations in a population similar to the population studied, it is important to use gene expression patterns from the specific tissue of interest [18]. We present our method of gene selection that can be combined with candidate gene sets as a cost-effective way to increase the amount of gene expression information obtained in large studies where using a genome-wide measurement platform is not feasible

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC genomics	Publication Date: Dec 1, 2015
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Leveraging global gene expression patterns to predict expression of unmeasured genes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC genomics

Lead the way for us

Similar Papers

Abstract 2171: Leveraging global gene expression patterns to identify gene sets that predict expression of large numbers of unmeasured genes
James Rudd ... Jennifer A Doherty
Cancer Research | VOL. 75
James Rudd, et. al.James Rudd ... Jennifer A Doherty
01 Aug 2015
Cancer Research | VOL. 75

Abstract A14: TP53 missense mutations associate with different metabolic pathways
Linda E Kelemen ... David D Bowtell
Clinical Cancer Research | VOL. 24
Linda E Kelemen, et. al.Linda E Kelemen ... David D Bowtell
01 Aug 2018
Abstract A14: TP53 missense mutations associate with different metabolic pathways
Linda E Kelemen ... David D Bowtell

Abstract 5708: HOXA4 and HOXB3 gene expression signature as a biomarker of tumor recurrence in patients with high-grade serous ovarian cancer (HGSOC) following primary cytoreductive surgery and first-line adjuvant chemotherapy
Jai N Patel ... Chad Livasy
Cancer Research | VOL. 77
Jai N Patel, et. al.Jai N Patel ... Chad Livasy
01 Jul 2017
Cancer Research | VOL. 77

Abstract 3407: Gene expression subtypes of high grade serous ovarian cancer in African American women
Jennifer A Doherty ... Paul Terry
Cancer Research | VOL. 76
Jennifer A Doherty, et. al.Jennifer A Doherty ... Paul Terry
15 Jul 2016
Abstract 3407: Gene expression subtypes of high grade serous ovarian cancer in African American women
Jennifer A Doherty ... Paul Terry

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging global gene expression patterns to predict expression of unmeasured genes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC genomics