Abstract
Protein structures can provide some functional evidences. Therefore structural genomics efforts to identify the functions of hypothetical proteins have brought advances in our understanding of biological systems. To this end, a new strategy to mine protein databases in the search for candidates for function prediction was here described. The strategy was applied to Escherichia coli proteins deposited in the NCBI’s non-redundant database. Briefly, data mining selects small conserved hypothetical proteins without significant templates on Protein Data Bank, without transmembrane regions and with similarity to Eukaryote proteins. Through this strategy, 12 protein sequences were selected for molecular modelling, from a total of 13,306 E. coli's conserved hypothetical sequences. From these, only three sequences could be modelled. GI 488361128 model was similar to cupredoxins, GI 281178323 model was similar to β-barrel proteins and GI 227886634 model showed structural similarities to lipid binding proteins. However, only the GI 227886634 seems to have a function related to the similar structures, since it was the unique structure that kept the fold during the molecular dynamics simulation. The method here described can be relevant to select hypothetical sequences that can be targets for in vitro and/or in vivo functional characterization.
Highlights
In the post-genomic era, several protein sequences have become available through the conceptual translation of putative open reading frames (ORFs)
Starting from the NCBI’s non-redundant protein database (NR–downloaded at June, 2011), the proteins annotated as conserved hypothetical proteins ranging from 30 to 200 amino acid residues from E. coli derived from mRNA, genomic DNA or experimental observation were extracted, composing the initial data set
The functions of other hypothetical proteins may be adequately predicted through NMR spectroscopy or X-ray diffraction, as observed for the hypothetical protein MJ0577 from Methanococcus jannaschii, which was solved by X-ray diffraction [9]
Summary
In the post-genomic era, several protein sequences have become available through the conceptual translation of putative open reading frames (ORFs). These proteins have been annotated without a detailed structural analysis or functional evaluation. Several problems have been related to data integrity, such as sequencing errors or sample contaminations [2]. Protein function can be predicted by many methods based on sequence similarity. Assuming that similar sequences have similar functions, such approaches have created prospects for rapid progress in molecular biology [3]. In vitro and/or in vivo validations are essential to provide a more accurate prediction, avoiding false results [4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.