Abstract

BackgroundIn genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA).ResultsSurprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution.ConclusionsOur results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.

Highlights

  • In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution

  • This study identified the prevalence of genes with non-Normal gene expression distributions within cancer patient cohorts for acute myeloid leukemia (AML), ovarian cancer (OV), and Glioblastoma multiforme (GBM) from the Cancer Genome Atlas (TCGA)

  • We tested the utility of incorporating assumptions based on the gene expression distribution into survival analysis models for the three cancer patient cohorts

Read more

Summary

Introduction

We often assume that continuous data, such as gene expression, follow a specific kind of distribution. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). As we begin to learn more about the diversity of gene expression in human populations, we call into question the relevance of assuming that the transcriptome can be uniformly modeled by just one distribution. Using the Cancer Genome Atlas (TCGA) as a platform to investigate this question, we show that more than half of genes in the cancer transcriptome are non-Normally distributed for multiple tumor types. Incorporating assumptions based on multiple distribution categories into the analysis of gene expression revealed information for understanding the transcriptional control of cancer that would have been missed using standard approaches

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.