Abstract
BackgroundGaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well. In recent years the call for a combined analysis of multiple omics layers became prominent, giving rise to a few multi-omics enrichment tools. Each of these has its own drawbacks and restrictions regarding its universal application.ResultsHere, we present the multiGSEA package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layers. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. multiGSEA supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs.ConclusionsWith multiGSEA we introduce a highly versatile tool for multi-omics pathway integration that minimizes previous restrictions in terms of omics layer selection, pathway database availability, organism selection and the mapping of omics feature identifiers. multiGSEA is publicly available under the GPL-3 license at https://github.com/yigbt/multiGSEA and at bioconductor: https://bioconductor.org/packages/multiGSEA.
Highlights
Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods
Through different statistical techniques, such as over-representation analysis (ORA) or gene set enrichment analysis (GSEA), these methods are capable of identifying specific sets of genes or molecular response/signaling pathways that are triggered upon a certain treatment or disease
A comprehensive vignette of the multiGSEA package can be found in our git repository or at the Bioconductor package website
Summary
Example use case In the following, we will illustrate a use case scenario on human mitochondrial stress data. (ii) use the devtools library [34] to install directly from our git repository: Example data and pathway definitions At the beginning we need to set up several prerequisites This includes loading the package itself and those packages that are needed to map omics feature IDs such as transcript IDs or metabolite IDs (i). MultiGSEA works with nested lists where each sublist represents an omics layer Such a data structure is initialized with the initOmicsDataStructure() command: The feature ranks are calculated separately for each of the applied omics layers. Run pathway enrichment that we have ranked omics features and pre-formatted pathway definitions, we can calculate GSEA-based pathway enrichments for each omics layer separately by means of multiGSEA: The pathway enrichment within multiGSEA is done by the fgsea package [19] This package allows to efficiently and accurately calculate arbitrarily low GSEA p values for a collection of feature sets. We used the p.adjust() command to apply a Benjamini/Hochberg correction [39]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have