Abstract

Background: Annotation ambiguities and annotation errors are a general challenge in genomics. While a reliable protein function assignment can be obtained by experimental characterization, this is expensive and time-consuming, and the number of such Gold Standard Proteins (GSP) with experimental support remains very low compared to proteins annotated by sequence homology, usually through automated pipelines. Even a GSP may give a misleading assignment when used as a reference: the homolog may be close enough to support isofunctionality, but the substrate of the GSP is absent from the species being annotated. In such cases, the enzymes cannot be isofunctional. Here, we examined a variety of such issues in halophilic archaea (class Halobacteria), with a strong focus on the model haloarchaeon Haloferax volcanii. Results: Annotated proteins of Hfx. volcanii were identified for which public databases tend to assign a function that is probably incorrect. In some cases, an alternative, probably correct, function can be predicted or inferred from the available evidence, but this has not been adopted by public databases because experimental validation is lacking. In other cases, a probably invalid specific function is predicted by homology, and while there is evidence that this assigned function is unlikely, the true function remains elusive. We listed 50 of those cases, each with detailed background information, so that a conclusion about the most likely biological function can be drawn. For reasons of brevity and comprehension, only the key aspects are listed in the main text, with detailed information being provided in a corresponding section of the Supplementary Materials. Conclusions: Compiling, describing and summarizing these open annotation issues and functional predictions will benefit the scientific community in the general effort to improve the evaluation of protein function assignments and more thoroughly detail them. By highlighting the gaps and likely annotation errors currently in the databases, we hope this study will provide a framework for experimentalists to systematically confirm (or disprove) our function predictions or to uncover yet more unexpected functions.

Highlights

  • Haloferax volcanii is a model organism for halophilic archaea [1,2,3,4,5,6], for which an elaborate set of genetic tools has been developed [7,8,9]

  • Many of these errors are caused by an invalid annotation transfer between presumed homologs, which, once introduced, are further spread by annotation robots. This problem can be partially overcome by using a Gold Standard Protein (GSP)-based annotation strategy [11]

  • (c) We described the reconstruction of riboflavin biosynthesis based on a detailed bioinformatic reconstruction [236]

Read more

Summary

Introduction

Haloferax volcanii is a model organism for halophilic archaea [1,2,3,4,5,6], for which an elaborate set of genetic tools has been developed [7,8,9]. Many of these errors are caused by an invalid annotation transfer between presumed homologs, which, once introduced, are further spread by annotation robots This problem can be partially overcome by using a Gold Standard Protein (GSP)-based annotation strategy [11]. With a decreasing level of sequence identity, the assumption of isofunctionality becomes increasingly uncertain This may be counterbalanced by additional evidence, e.g., gene clustering, experimental confirmation would be the best option for validation of the annotation. Even a GSP may give a misleading assignment when used as a reference: the homolog may be close enough to support isofunctionality, but the substrate of the GSP is absent from the species being annotated In such cases, the enzymes cannot be isofunctional. By highlighting the gaps and likely annotation errors currently in the databases, we hope this study will provide a framework for experimentalists to systematically confirm (or disprove) our function predictions or to uncover yet more unexpected functions

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.