Abstract
BackgroundGene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms.ResultsWe propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method.ConclusionsWe demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data.
Highlights
Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers
Analysis results indicated that mRNA expression was a dominant feature in many functional categories of both cancer types, and Copy number variation (CNV) was a dominant feature in many functional categories of breast cancer
We further investigated a problem of delineating cancer subtypes with biomarkers extracted from multi-OMIC data, and applied multivariate GSEA (MGSEA) to The Cancer Genome Atlas (TCGA) breast cancer and glioblastoma multiforme (GBM) data to identify the combinatorial relations of gene set enrichment information from multiple platforms
Summary
Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. The goal is to assess whether the high-scoring genes are enriched with members in the gene set To achieve this goal, GSEA sorts genes in terms of their scores and establishes a random walk along the sorted genes. GSEA sorts genes in terms of their scores and establishes a random walk along the sorted genes It advances one step when hitting a member from the gene set and reverses one step otherwise. The level of enrichment and its statistical significance are quantified by the maximum positive distance from the origin during the random walk. This simple yet powerful method is applicable to a wide range of bioinformatics problems. One may evaluate the scores of differential expressions between the transcriptomic data of tumor and normal samples and find the enriched functional categories of top-ranking biomarkers
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have