Abstract

Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52-77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNA-Seq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github.com/bzhanglab/sumer.

Highlights

  • Not in Set response to cellular response to xenobiotic metabolic xenobiotic stimulus is a xenobiotic stimulus part of process p = 7.9x10-10 p = 2.6x10-10 p = 1.5x10-10 Full Gene SetRank in Ordered DatasetLeading Edge cover algorithm selected 78 of the enriched pathways associated with basal compared with luminal A breast cancer in RNA data, whereas the weighted set cover only required 51 of those same pathways (Fig. 2B)

  • We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment

  • gene set enrichment analysis (GSEA) was used for pathway enrichment of genes and proteins differentially expressed between the basal and luminal A breast cancer subtype tumors in the the Cancer Genome Atlas (TCGA) study

Read more

Summary

Graphical Abstract

Weighted set cover and affinity propagation algorithms are used to combine results from multiple enrichment analyses. We used the affinity propagation algorithm [20], which groups functionally related gene sets identified from multiple experiments or omics platforms into clusters and automatically determines the most representative gene set for each cluster. We implemented both weighted set cover and affinity propagation algorithms into an R package named Sumer. Sumer first reduces annotation redundancy in the results from an individual enrichment analysis using weighted set cover It clusters the results from multiple enrichment analyses using affinity propagation and provides tables, static and interactive plots, and downloadable results for exploration and publication. Onstrate its efficiency in gene set redundancy removal and its application to multi-omics and pan-cancer studies

EXPERIMENTAL PROCEDURES
RESULTS
Background
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call