Abstract

BackgroundOne method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets.MethodsAfter gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment.ResultsWe demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods.ConclusionsBy combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

Highlights

  • One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways

  • Consolidation of enriched pathways Given the initial resultant gene set and the initial most enriched pathway, we iteratively reduce the list of enriched pathways and the resultant gene set by removing the genes in the most enriched pathway from the resultant gene set and recomputing enrichment p-values for the remaining pathways using the reduced gene set

  • We outline how the various consolidation methods find different descriptions of the resulting pathway concepts related to the supplied resultant gene list

Read more

Summary

Introduction

One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. There exist several public data sources such as Biocarta [1], KEGG [2], WikiPathways [3], Pathway Commons [4], NCBI’s Biosystems [5], NCI Nature [6], Reactome [7] and HumanCyc(a member of the BioCyc database) [8] for pathway annotations including cellular process, metabolic process, molecular function, and physiological process These data sources provide a variety of information ranging from simple formats, for example a list of genes involved in a specific pathway, to complex information, like the directed graph of biological entities and their effect on each other. When no ordering measurement is available, some other means, like Fisher’s Exact test is necessary to find enriched pathways

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.