Abstract
BackgroundRecent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one’s biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions.ResultWe developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes.ConclusionsBubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2012-4) contains supplementary material, which is available to authorized users.
Highlights
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level
BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it
When comparing results coming from different laboratories or generated on different platforms, the biological and technical variability makes the reproducibility in the regulation of a gene set more robust than in the regulation of a single gene [6,7,8,9,10,11,12,13]
Summary
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. An additional strength of GSEA is that it allows better exploiting the ever increasing knowledge on gene networks and their relationships with biological processes, documented contribution to a given biological function as allowed by gene ontology or pathway analyses and co-expression across a variety of conditions, predicted regulation by a common set of transcription factors, or association with specific diseases as informed by genome wide association studies Thousands of such gene sets have been carefully curated and regrouped in public databases such as the Molecular Signatures database (MsigDB) [3, 4] or the Stanford Microarray Database (SMD) [5]. BubbleGUM can be used more broadly to facilitate multi-Omics analyses, since it basically estimates the degree of correlation between lists of molecules associated with intensity signals of any kind, including from mRNA hybridization experiments (microarrays) and from sequencing assays encompassing epigenetic and RNAseq data as well as mass spectrometry data for proteomics, and it should be applicable to metabolomics
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.