Abstract

The rapid increase in the number of sequenced microbial genomes provides unprecedented opportunities to computational biologists to decipher the genomic structures of these microbes through development and application of advanced comparative genome analysis tools. In this presentation, we describe a systematic study we have been carrying out on deciphering microbial genomic structures and linking the discovered genomic structures to prediction of metabolic pathways. This study consists of the following three main components: (a) deciphering microbial genomic structures and discovering new ones through development and application of advanced comparative genome analysis tools, (b) systematic study of relationships between microbial genomic structures and metabolic pathways through mapping all KEGG pathways to over 300 microbial genomes, and (c) application of the discovered relationships between genomic structures and pathways to prediction of biological pathways and networks. A. Deciphering microbial genomic structures: We have recently developed a computer program JPOP (1,2) for operon structure predictions in both prokaryotic and archaea genomes. Testing on E. coli. data with experimental validation indicates that the program has an prediction accuracy about 80%. Since the publication of JPOP, a couple of operon prediction programs have been published including VIMSS (10) and Pathway Tools (11), reaching similar levels of prediction accuracy. Using these programs, we have made operon prediction for 300+ microbial genomes (all data are available upon request). This data set not only provides a rich source of information for our prediction of biological pathways and networks (see section C), but also facilitates investigation of higher level and less understood structures in microbial genomes. Through comparative genome analyses of 300+ microbial genomes, we have recently firmly established uber-operon, a concept introduced a few years ago by other authors, as a layer of genomic structures, which have direct implications to biological pathway predictions (3). For example, we have demonstrated that a number of well studied metabolic pathways are made of (genes of) a small number of uber-operons (versus a large number of operons) (3). In addition, we have established some interesting relationships between uber-operons and regulons, which have established a solid stepping stone for us to develop a computer program for regulon prediction in general via prediction of uber-operons. We have also recently developed an effective paradigm for predicting cis regulatory elements (4), through comparative analysis of closed related genomes, providing another important piece of information for regulon prediction. We expect that we will be able to develop the first computer program for regulon prediction in the very near future. B. Systematic mapping of metabolic pathways to microbial genomes: The metabolic pathways of KEGG database provides a rich source of information, which can be directly mapped to individual genomes. However until very recently, there has not been an effective way for mapping KEGG pathways to genomes other than the simple minded approach through sequence similarity search. We have recently demonstrated that BLAST search or its variations/generalizations such as bi-direction best hit (BDBH) or COG search do not provide satisfactory mapping results (5) as virtually all these methods attempt to find orthologous gene relationship using sequence similarity information alone. We have recently developed a computer program P-MAP for mapping orthologous genes in the context of pathway mapping using both sequence similarity information and genomic structure information, having substantially improved the mapping accuracy of pathways. The basic idea of P-MAP pathway mapping is that it attempts to map genes of a pathway to their homologous genes in the target genome, under the condition that these mapped genes are grouped into a (small) number of operons. The limitation of the current P-MAP algorithm is that it assumes that a template pathway is given in a form that its individual components have genes assigned in the template genome, limiting direct applications of KEGG (template) pathways. We have recently generalized the framework of P-MAP, allowing mapping a generic pathway model (consisting of enzymes and enzymatic reasons rather than specific genes assigned to each enzyme) to a target genome, by mapping individual enzymes to genes that are grouped into a number of operons in the target genome (6). Using this novel capability, we have mapped metabolic pathways of KEGG to 300+ microbial genomes (data are available upon request). A detailed analysis is currently under way, attempting to understand the general relationship between metabolic pathways and operon, uber-operon and regulon structures. We expect that this analysis will lead to new understanding about genomic structures, the organization and evolution of metabolic pathways, which is expected to be done within the next few weeks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call