Abstract

Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

Highlights

  • Advanced statistical methods are most often used to perform the analysis of highthroughput data such as gene-expression assays [1,2,3,4,5], the result of which is a long list of ‘‘significant genes.’’ Extracting biological meaning from such lists is a nontrivial and time-consuming task, which is exacerbated by the inconsistencies in free-text gene annotations

  • One way to gain insight into the biological significance of alterations in gene expression levels is to determine whether the Gene Ontology (GO) terms associated with the particular biological process, molecular function, or cellular component are over- or underrepresented in the set of genes deemed significant by the statistical analysis [6]

  • Because enrichment analysis with GO is widely accepted and scientifically valuable, we argue that the logical step is to extend this methodology to other ontologies— disease ontologies

Read more

Summary

Introduction

Advanced statistical methods are most often used to perform the analysis of highthroughput data such as gene-expression assays [1,2,3,4,5], the result of which is a long list of ‘‘significant genes.’’ Extracting biological meaning from such lists is a nontrivial and time-consuming task, which is exacerbated by the inconsistencies in free-text gene annotations. One way to gain insight into the biological significance of alterations in gene expression levels is to determine whether the GO terms associated with the particular biological process, molecular function, or cellular component are over- or underrepresented in the set of genes deemed significant by the statistical analysis [6]. This process, often referred to as ‘‘enrichment analysis,’’ can be used to summarize a gene-set [7], it can be relevant for other high-throughput measurement modalities including proteomics, metabolomics, and studies using tissue-microarrays [8]. Note that there is research underway on the use of ‘‘pathways’’ for enrichment analyses as well as comparing statistically significant, concordant differences between two biological states as in Gene Set Enrichment Analysis [9], which are not discussed here

Gene Ontology Enrichment Analysis
Using Disease Ontologies— Going beyond GO Annotations
Advances in Ontology Access and Automated Annotation
DIY Disease Ontology-based Enrichment Analysis Workflow
Creating Reference Sets for Custom Enrichment Analysis
Novel Use Cases Enabled
Findings
Summary
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.