A manual curation strategy to improve genome annotation: application to a set of haloarchael genomes.

Friedhelm Pfeiffer,Dieter Oesterhelt

doi:10.3390/life5021427

Abstract

Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae). Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins). To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt), to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex.

Highlights

Protein function assignments in public databases suffer from severe errors
We describe an effort for a high-quality annotation of a set of haloarchaeal genomes
The annotation of Hbt. salinarum strain NRC-1 [45], the classical genome of halophilic archaea, is covered by our approach as most its genes are represented in strain R1 with an identical protein sequence

Summary

Introduction

Protein function assignments in public databases suffer from severe errors. Genomes are commonly subjected to automatic annotation procedures by computational annotation robots. As these procedures build on the information provided in public databases, errors in the database may be “propagated, leading to a potential transitive catastrophe” [3]. Error propagation could be substantially reduced if annotations are copied only from those proteins which themselves have been functionally characterized. Such proteins are referred to as “Gold Standard Proteins” [4,5]. The SwissProt section of UniProt is a rich source for Gold Standard

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Life	Publication Date: Jun 2, 2015
Citations: 85	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A manual curation strategy to improve genome annotation: application to a set of haloarchael genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Life

Lead the way for us

Similar Papers

由新發現於 Haloarcula marismortui 與 Haloquadratum walsbyi之氯視紫質研究揭露異於其他微生物中離子運送蛋白質之高度保守性

-

01 Jan 2009
01 Jan 2009

Transformation of members of the genus Haloarcula with shuttle vectors based on Halobacterium halobium and Haloferax volcanii plasmid replicons
S W Cline ... W F Doolittle
jb | VOL. 174
S W Cline, et. al.S W Cline ... W F Doolittle
01 Feb 1992
jb | VOL. 174

Characterization of paralogous and orthologous members of the superoxide dismutase gene family from genera of the halophilic archaebacteria.
P Joshi ... P P Dennis
jb | VOL. 175
P Joshi, et. al.P Joshi ... P P Dennis
01 Mar 1993
jb | VOL. 175

Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
Jeanne Wilbrandt ... Kristen A Panfilio
BMC Genomics | VOL. 20
Jeanne Wilbrandt, et. al.Jeanne Wilbrandt ... Kristen A Panfilio
17 Oct 2019
BMC Genomics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A manual curation strategy to improve genome annotation: application to a set of haloarchael genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Life