Abstract

During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.

Highlights

  • Gene function prediction is currently one of the fundamental problems in microbiology [1]

  • Based on trends between gene sequence divergence and gene order divergence over time, we developed a new synteny-based method to refine functional annotation

  • This method uses these trends to determine the probability that any two syntenous genes are functionally related

Read more

Summary

Introduction

Gene function prediction is currently one of the fundamental problems in microbiology [1]. In the dataset of full Bacterial and Archaeal genomes from NCBI, 874,583 genes out of 2,668,809 (,33%) are annotated as hypothetical proteins, and 25% of the protein families in the PFAM database have unknown functions [1]. In addition to these un-annotated genes, many of the genes in these databases only have general function predictions or may have incorrect function predictions. Various protein function prediction methods make use of synteny, as reviewed by Rogozin et al in 2004 [3,7,8,9,10,11], but do not consider evolutionary distance between genomes in their predictions. Snel et al simulated random genome shuffling to determine the probability of conserved gene order in a specific number of genomes [13], and Von Mering et al assessed the likelihood of protein relatedness

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call