Abstract

A prevailing technique to infer function from lists of identifications, from molecular biological high-throughput experiments, is over-representation analysis, where the identifications are compared to predefined sets of related genes often referred to as pathways. As at least some pathways are known to be incomplete in their annotation, algorithmic efforts have been made to complement them with information from functional association networks. While the terminology varies in the literature, we will here refer to such methods as Network Enrichment Analysis (NEA). Traditionally, the significance of inferences from NEA has been assigned using a null model constructed from randomizations of the network. Here we instead argue for a null model that more directly relates to the set of genes being studied, and have designed one dynamic programming algorithm that calculates the score distribution of NEA scores that makes it possible to assign unbiased mid p values to inferences. We also implemented a random sampling method, carrying out the same task. We demonstrate that our method obtains a superior statistical calibration as compared to the popular NEA inference engine, BinoX, while also providing statistics that are easier to interpret.

Highlights

  • Over-Representation Analysis (ORA) is commonly used to infer function from sets of analytes such as genes, transcripts, proteins or metabolites [1,2,3]

  • We demonstrate that our method obtains a superior statistical calibration as compared to the popular Network Enrichment Analysis (NEA) inference engine, BinoX, while providing statistics that are easier to interpret

  • We implemented a Python program that reads network and pathway definition files and scores a query sets against a pathway according to Eq (3), using GeneSetDP and GeneSetMC described in the Algorithm section, that enabled us to assign p values according to Eq (2)

Read more

Summary

Introduction

Over-Representation Analysis (ORA) is commonly used to infer function from sets of analytes such as genes, transcripts, proteins or metabolites [1,2,3]. One prominent application of the technique is expression analysis, where ORA is regularly used to assess alternation in pathway activity by examining significantly different concentrations of analytes between biological conditions, such as disease state or treatment group. Most ORA methods are assessing the overlap between the investigated set of analytes, the query set, and a functional module, the pathway set, using hypergeometric test or a Fisher’s exact test. Variants such as Gene Set Enrichment Analysis (GSEA) [4] includes information on expression levels of the analytes of the query set.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.