Abstract
Gene signatures are more and more used to interpret results of omics data analyses but suffer from compositional (large overlap) and functional (correlated read-outs) redundancy. Moreover, many gene signatures rarely come out as significant in statistical tests. Based on pan-cancer data analysis, we construct a restricted set of 962 signatures defined as informative and demonstrate that they have a higher probability to appear enriched in comparative cancer studies. We show that the majority of informative signatures conserve their weights for the genes composing the signature (eigengenes) from one cancer type to another. We finally construct InfoSigMap, an interactive online map of these signatures and their cross-correlations. This map highlights the structure of compositional and functional redundancies between informative signatures, and it charts the territories of biological functions. InfoSigMap can be used to visualize the results of omics data analyses and suggests a rearrangement of existing gene sets.
Highlights
The majority of the studies exploring gene expression data result in one or more gene signatures, i.e., list of genes sharing a common pattern of expression that can be employed to classify groups of samples in any independent dataset
A large The Cancer Genome Atlas (TCGA) compendium of gene expression data derived from 32 solid cancer types was employed to restrict the input collection of 12,096 gene signatures to 962 informative ones
Compendia is posing two main challenges related to the reliability and the redundancy of the collected gene sets
Summary
The majority of the studies exploring gene expression data result in one or more gene signatures, i.e., list of genes sharing a common pattern of expression that can be employed to classify groups of samples in any independent dataset. Not all the signatures contained in these compendia are informative and the number of gene sets representing the same biological process is not equilibrated These two phenomena affect the results of classical transcriptomic data analysis with heavy p-value corrections producing a high number of false negative results. Two signatures may represent two different transcriptional read-outs of the same biological process, we will refer to them as functionally redundant. The existence of multiple functionally redundant signatures affects results of classical transcriptomic data analysis by highly scoring multiple gene sets belonging to analogous/related biological processes. These multiple comparisons of redundant signatures can potentially hide relevant hits. Any estimation of the functional redundancy is conditioned by the context (e.g., certain cancer type) and depends on the type of data used to evaluate the redundancy
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.