Abstract

Genome annotation currently tends to represent a static snapshot. Routine re-annotation, perhaps using wiki software, would help.

Highlights

  • Before addressing the problems with annotation, I will first summarize how it is done

  • You think that gene you just retrieved from GenBank [1] is correct? Are you certain? If it is a eukaryotic gene, and especially if it is from an unfinished genome, there is a pretty good chance that the amino acid sequence is wrong

  • The process of sequencing and annotating the DNA of a bacterial species has become highly automated in recent years, but the major steps are quite similar to what was done for the very first bacterial genome, Haemophilus influenzae, in 1995 [5]

Read more

Summary

What is genome annotation?

Before addressing the problems with annotation, I will first summarize how it is done. The laboratory steps have not changed greatly since H. influenzae: they begin with DNA purification, followed by shearing the DNA into countless small fragments (the ‘shotgun’ step) These fragments are cloned and sequenced from both ends and assembled, usually resulting in a set of contiguous DNA sequences (contigs) joined together into larger scaffolds. First a gene finder (such as, for bacteria, Glimmer [6] or GeneMark [7]) is run over the genome, producing a set of predicted protein-coding genes These programs are very accurate, though not perfect. For each gene that has a significant match, the BLAST output can be used to assign a name and function to the protein The accuracy of this step depends on the annotation software, and on the quality of the annotations already in the database. The pipeline software will usually take extra steps to find any genes missed by earlier steps; typically this involves running a translated search, aligning all six possible translations of the unannotated sections to a database

Partial and draft genomes
The role of GenBank
Some inconvenient truths
Possible solutions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call