Abstract

Pathway enrichment analysis represents a key technique for analyzing high-throughput omic data, and it can help to link individual genes or proteins found to be differentially expressed under specific conditions to well-understood biological pathways. We present here a computational tool, SEAS, for pathway enrichment analysis over a given set of genes in a specified organism against the pathways (or subsystems) in the SEED database, a popular pathway database for bacteria. SEAS maps a given set of genes of a bacterium to pathway genes covered by SEED through gene ID and/or orthology mapping, and then calculates the statistical significance of the enrichment of each relevant SEED pathway by the mapped genes. Our evaluation of SEAS indicates that the program provides highly reliable pathway mapping results and identifies more organism-specific pathways than similar existing programs. SEAS is publicly released under the GPL license agreement and freely available at http://csbl.bmb.uga.edu/~xizeng/research/seas/.

Highlights

  • High-throughput omic techniques are being increasingly more widely used by large research centers as well as by individual labs because of the rapidly decreasing costs and the increasing quality of the data generated

  • Statistical enrichment analysis methods fall into three classes according to enrichment algorithms [13]: (i) singular enrichment analysis (SEA), which calculates an enrichment P-value on each pathway and lists the enriched pathways in a linear table based on the hyper-geometric distribution assumption [14] or using Fisher exact test [15,16] among a few other methods [17] [18]; (ii) gene set enrichment analysis [19], which considers an entire gene set encoded in a genome and associated experimental values; and (iii) modular enrichment analysis [20], which uses the key idea of SEA but considers pathway-pathway or gene-gene relations in its enrichment P-value calculation

  • We have evaluated different combinations of reference genomes in an iterative manner (Figure 2 and 3) based on the taxonomic distance, defined as the number of nodes in the path from the query organism to its closest common ancestor with its reference organism in the taxonomy tree defined in the KEGG Genome database

Read more

Summary

Introduction

High-throughput omic techniques are being increasingly more widely used by large research centers as well as by individual labs because of the rapidly decreasing costs and the increasing quality of the data generated. P-MAP uses both high sequence similarity and operon information for orthologous gene mapping, and tend to make the mapping results more accurate than BDBH when it is applicable When neither of these two methods provides useful mapping results, which could be true for partially sequenced genomes and meta-genomes, we use NCBI BLAST (blastp for DNA, blastx for protein) (see Material and Methods on E-value cutoff), to compare the query genes/proteins against one or more reference genomes in SEED specified by the user, in which we select the top hit with known annotation in SEED. We have compared the pathway annotation performance by the two programs on a newly sequenced genome, N. profundicola [33] using E. coli pathways in KEGG and SEED as references, respectively (using FDR#0.05 as cutoffs). Pathway enrichment analysis with the whole E. coli genome as background: seas.exe pathfind –m hyper -1 example.ann -2 ‘‘Escherichia coli’’ .example.pathways, where -m specifies statistical method (hyper for hyper-geometric test, binom for binomial test, chisq for Chi Square test and fisher for Fisher Exact test), -1 for sample annotation file from the above step and -2 specifies background annotation file, with built-in whole genome by species name or from the above step

Conclusion
Findings
Materials and Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.