Abstract

Escherichia coli , one of the favorite model organisms, was initially annotated in 1997 and re-annotated in 2007. Although years of intensive research is being carried out on E. coli genome, still complete and accurate functional information of this organism is not available. In E. coli , about 40% of the protein sequences have been annotated as hypothetical proteins, because of lack of information. Hence, such sequences require advanced computational strategies and derive clues on their biological role. Herein, we have carried out re-annotation of the complete genome of E. coli K-12 using “Dynamic biological data fusion method”. It is a computational strategy we typically applied for combining the heterogeneous biological data sources to maximize knowledge sharing and generating the intersection of data sets. Functional re-annotation results reported in this paper help us to present high quality data on complete proteome of E. coli K-12. We have updated all the protein coding genes from previous annotation work and tried to assign new or more precise functions, wherever possible. About 29% of the protein sequences of E. coli which have been previously annotated as unclear / unknown (hypothetical; without functions) have now been assigned with clear / known functions. Further, the analysis also resulted in the revision of the protein sequences that have been found to be false positive or poorly annotated. Information from this work is made available as a database, “REC-DB, which will remain a useful repository with accurate and updated functional information. Availability: REC-DB is publicly available at http://192.168.2.168/recdb/index.html

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call