Abstract
BackgroundGene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*). We compare these interaction-sensitive GSEA approaches to traditional χ2 rankings in simulated genome-wide array data, and in a target and replication cohort of congenital heart disease patients with conotruncal defects (CTDs).ResultsIn the simulation study and for both CTD datasets, both Relief-based approaches to GSEA captured more relevant and significant gene ontology terms compared to the univariate GSEA. Key terms and themes of interest include cell adhesion, migration, and signaling. A leading edge analysis highlighted semaphorins and their receptors, the Slit-Robo pathway, and other genes with roles in the secondary heart field and outflow tract development.ConclusionsOur results indicate that interaction-sensitive approaches to enrichment analysis can improve upon traditional univariate GSEA. This approach replicated univariate findings and identified additional and more robust support for the role of the secondary heart field and cardiac neural crest cell migration in the development of CTDs.
Highlights
Gene set enrichment analysis (GSEA) has emerged as a useful approach to hypothesis generation
We demonstrate the efficacy of using Relief-based algorithm (RBA) feature scores for ranking in real-world data by comparing (1) univariate analysis ranking, (2) MultiSURF ranking, and (3) MultiSURF* ranking, in concert with GSEA using genome-wide genotype data from two cohorts with congenital heart disease (CHD) as the target disease phenotype
For the CHD data, we include the top 15 Gene Ontology (GO) terms identified by the pre-ranked GSEA for each of the three analysis strategies, i.e., univariate, MultiSURF and MultiSURF*, applied to both cohorts (Fig. 1, step 6)
Summary
Gene set enrichment analysis (GSEA) has emerged as a useful approach to hypothesis generation. Enrichment analyses are typically conducted using either self-contained or competitive hypothesis testing [6] The latter of the two tests the magnitude of phenotype association of genes in a gene set in contrast to the rest of the genes in the genome. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*). Gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*) We compare these interaction-sensitive GSEA approaches to traditional χ 2 rankings in simulated genome-wide array data, and in a target and replication cohort of congenital heart disease patients with conotruncal defects (CTDs)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.