Abstract

The generation of genomic information has increased exponentially in the past decade, and while this has aided biological research immensely, it has also been shown to be problematic. The information being collected is compiled at a rate much faster than it can be interpreted, and this has led to potential annotation errors in genomic databases. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a widely used genomic database that records compilations of biochemical pathways. KEGG uses the genome of the bacterium Escherichia coli as a reference genome for some of these pathways. The biosynthesis of the amino acid histidine is a complex, multistep process that requires the use of a multitude of enzymes to catalyze a sequence of biochemical reactions. E. coli utilize a fused, bifunctional imidazoleglycerol‐phosphate dehydratase/histidinol‐phosphatase encoded by the HisB gene to catalyze the sixth and eighth steps in the pathway. However, other bacteria may be able to catalyze these steps using two separate proteins instead of a single fused protein similar to the one utilized by E. coli. The deep‐sea autotrophic bacterium Thiomicrospira crunogena has been proposed to have evolved to catalyze these two steps in this manner by using a separated pair of proteins, encoded by the Tcr_0356 and Tcr_1966 genes, instead of a fused protein. As a consequence, when the genome of T. crunogena was compared against the reference genome, KEGG determined that this organism cannot perform the phosphatase activity in the eighth step and therefore machine calling a histidine auxotroph. By expressing and isolating the proteins and performing biochemical assays, evidence has been collected revealing a protein hypothesized to be a histidinol‐phosphatase. A malachite green phosphatase assay was used to assess the phosphatase activity of the protein encoded by Tcr_0356, while an IGP‐Dehydratase assay was used to assess activity of the protein encoded by the Tcr_1966 gene. The implications of this study further solidify the need for accurate annotations in databases in order to maintain the integrity of genomic data.Support or Funding InformationThis work is based upon ideas generated and supported by the National Science Foundation under Grant Number 0954829, funding the Microbial Genome Annotation network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call