Abstract
BackgroundComparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer. We analyze a diverse set of 92 Bacteria and 79 Archaea in order to investigate the processes governing the origin and evolution of families of related genes within genomes.ResultsGenes were clustered into related groups using similarity criteria derived from BLAST. Most clusters contained genes from only one or a small number of genomes, and relatively few core clusters were found that spanned all genomes. Gene clusters found in larger numbers of genomes tended to have larger numbers of genes per genome; however, clusters with unusually large numbers of genes per genome were found among both narrowly and widely distributed clusters. Larger genomes were found to have larger mean gene family sizes and a greater proportion of families of very large size. We used a model of birth, death, and innovation to predict the distribution of gene family sizes. The key parameter is r, the ratio of duplications to deletions. It was found that the model can give a good fit to the observed distribution only if there are several classes of genes with different values of r. The preferred model in most cases had three classes of genes.ConclusionsThere appears to be a rapid rate of origination of new gene families within individual genomes. Most of these gene families are deleted before they spread to large numbers of genomes, which suggests that they may not be generally beneficial to the organisms. The family size distribution is best described by a large fraction of families that tend to have only one or two genes and a small fraction of families of multi-copy genes that are highly prone to duplication. Larger families occur more frequently in larger genomes, indicating higher r in these genomes, possibly due to a greater tolerance for non-beneficial gene duplicates. The smallest genomes contain very few multi-copy families, suggesting a high rate of deletion of all but the most beneficial genes in these genomes.
Highlights
Comparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer
It was estimated that the ‘extended core’ of bacterial genes contained only 250 genes [4]
Clustering Complete nucleic acid and translated proteome sequences were obtained from the NCBI Genomes database for 92 Bacteria from over a dozen phyla (Additional File 1, Table 5), sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea [20], and 79 Archaea (Additional File 1, Table 6)
Summary
Comparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer. We analyze a diverse set of 92 Bacteria and 79 Archaea in order to investigate the processes governing the origin and evolution of families of related genes within genomes. There are a large number of completely sequenced genomes of Bacteria and Archaea that can be used to study evolution at the whole-genome level. Comparison of sets of genes across genomes reveals that gene content varies quite substantially between even fairly closely related species. When diverse groups of genomes are compared, the set of core genes falls to very low numbers. It was estimated that the ‘extended core’ of bacterial genes (i.e. those present in 99% of sequenced genomes) contained only 250 genes [4]. A study aimed at constructing a universal phylogenetic tree [5] found only 31 genes present as clear orthologues in all genomes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.