Abstract
Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q-value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.
Highlights
DNA microarray technology enables simultaneous monitoring of the expression level of a large number of genes for a given experimental study
Mootha et al [4] proposed gene set enrichment analysis (GSEA), which considers the entire distribution of a predefined gene set rather than a subset from the differential expression list
The MAVTgsa was applied to a P53 dataset
Summary
DNA microarray technology enables simultaneous monitoring of the expression level of a large number of genes for a given experimental study. Much initial research on methods for data analysis has focused on the techniques to identify a list of differentially expressed genes. After selection of a list of differentially expressed gene, the list is examined with biologically predefined gene sets to determine whether any sets are overrepresented in the list compared with the whole list ([1,2,3]). Mootha et al [4] proposed gene set enrichment analysis (GSEA), which considers the entire distribution of a predefined gene set rather than a subset from the differential expression list. Microarray experiments inherit various sources of biological and technical variability, and analysis of a gene set is expected to be more reproducible than an individual gene analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have