Abstract

BackgroundReconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called “missing gene” problem.MethodsWe present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.ResultsWe have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of E. coli in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.ConclusionsAn effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on E. coli at a genome level. Numerous missing genes are found to be related to knwon E. coli pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.

Highlights

  • Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping

  • While some success has been reported on these programs, there has been a general issue associated with such homologous pathway mapping-based approaches, which is that homologous pathways are generally not identical and the mapped pathways could miss some parts not covered by their well-characterized homologous template pathways

  • Among the various areas for further improvements, we identified a few we can possibly improve on using the currently available information: (i) there have not been reliable methods for consideration and inclusion of functionally uncharacterized genes into partially predicted pathway models; (ii) while genomic synteny has been utilized for prediction of functionally associated genes, its true usefulness, other than operon information, is yet to be well documented

Read more

Summary

Introduction

Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. While some success has been reported on these programs, there has been a general issue associated with such homologous pathway mapping-based approaches, which is that homologous pathways are generally not identical and the mapped pathways could miss some parts not covered by their well-characterized homologous template pathways This problem, called pathway holes or missing genes, has been widely recognized [6,7,8,9]. A number of methods have been developed to find such missing genes, based mainly on the idea of finding genes that are functionally associated with genes already in the mapped pathways One class of such methods attempts to find enzyme-encoding genes missing in a mapped metabolic pathway based on multiple types of gene association information [8,9,10], taking advantage of the fact that genes encoding a metabolic pathway tend to group into clusters (e.g., operons). Full utilization of operon information should be a key direction for improving biological pathways, particular as the state of the art prediction methods for operons have reached high accuracy (~90%) [17,18,19]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call