In Silico Proteomic Functional Re-annotation of &lt;i&gt;Escherichia coli&lt;/i&gt; K-12 using Dynamic Biological Data Fusion Strategy

Ramesh Gopal,Subazini Thankaswamy Kosalai,Rajadurai Chinnasamy Perumal,Palani Kannan Kandavel

doi:10.5376/cmb.2014.04.0004

Abstract

Escherichia coli , one of the favorite model organisms, was initially annotated in 1997 and re-annotated in 2007. Although years of intensive research is being carried out on E. coli genome, still complete and accurate functional information of this organism is not available. In E. coli , about 40% of the protein sequences have been annotated as hypothetical proteins, because of lack of information. Hence, such sequences require advanced computational strategies and derive clues on their biological role. Herein, we have carried out re-annotation of the complete genome of E. coli K-12 using “Dynamic biological data fusion method”. It is a computational strategy we typically applied for combining the heterogeneous biological data sources to maximize knowledge sharing and generating the intersection of data sets. Functional re-annotation results reported in this paper help us to present high quality data on complete proteome of E. coli K-12. We have updated all the protein coding genes from previous annotation work and tried to assign new or more precise functions, wherever possible. About 29% of the protein sequences of E. coli which have been previously annotated as unclear / unknown (hypothetical; without functions) have now been assigned with clear / known functions. Further, the analysis also resulted in the revision of the protein sequences that have been found to be false positive or poorly annotated. Information from this work is made available as a database, “REC-DB, which will remain a useful repository with accurate and updated functional information. Availability: REC-DB is publicly available at http://192.168.2.168/recdb/index.html

Full Text