Abstract

BackgroundA necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence.ResultsIn this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively) are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences) is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download at ).ConclusionsThe reversed querying process and the program IdentiCS allow a fast and adequate prediction protein coding sequences and reconstruction of the potential metabolic network from low coverage genome sequences of bacteria. The new method can accelerate the use of genomic data for studying cellular metabolism.

Highlights

  • A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences

  • The results are summarized in Table 1. 92.6% and 97.7% of the coding sequences (CDS) in the original annotation of S. typhimurium are identified by using the Kyoto Encyclopedia of Genes and Genomes (KEGG) genome database and the whole protein database SWISS-PROT and TrEMBL, respectively

  • The sensitivity on the nucleotide level (91.1% and 98.2% for the two databases respectively) is similar as on the CDS level. These results suggest that the SWISS-PROT-TrEMBL based approach is more preferable than the KEGG genome based approach for our method

Read more

Summary

Introduction

A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. Metabolic network reconstruction is generally based on the identification of metabolic enzymes and the corresponding biochemical reactions in a specific organism For this purpose the EC numbers of all possible enzymes need to be determined. The set of EC numbers of an organism may be obtained from the genome annotation It covers three successive steps: (1) gene finding, (2) database searching and function assignment and (3) metabolic reconstruction. No details about the WIT approach have been published and merely some information about 55 annotated genomes (Status: March 2004) is publicly available on the website of WIT [20]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call