Abstract

BackgroundOver-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories.ResultsWe developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control.ConclusionsgoSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG

Highlights

  • Over-representation analysis (ORA) detects enrichment of genes within biological categories

  • We developed a tool called GO subtrees to tag and annotate genes (goSTAG) to use Gene Ontology (GO) Subtrees to Tag and Annotate Genes within a set. goSTAG visualizes the similarities between over-representations by clustering the pvalues from the statistical tests and labels clusters with the GO term that has the most paths to the root within the subtree generated from all the GO terms in the cluster

  • To demonstrate the utility of goSTAG, we analyzed the differentially expressed genes (DEGs) from gene expression analysis (Affymetrix GeneChip Rat Genome 230 2.0 arrays) of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs for 1, 6, or 24 h in order to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression

Read more

Summary

Introduction

Over-representation analysis (ORA) detects enrichment of genes within biological categories. Over-representation analysis (ORA) of genes was developed to identify biological themes [5].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call