Abstract

BackgroundWith the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. However, relatively few studies have scaled beyond single pangenomes to compare global genetic diversity both within and across different species. We present here several methods for “comparative pangenomics” that can be used to contextualize multi-pangenome scale genetic diversity with gene function for multiple species at multiple resolutions: pangenome shape, genes, sequence variants, and positions within variants.ResultsApplied to 12,676 genomes across 12 microbial pathogenic species, we observed several shared resolution-specific patterns of genetic diversity: First, pangenome openness is associated with species’ phylogenetic placement. Second, relationships between gene function and frequency are conserved across species, with core genomes enriched for metabolic and ribosomal genes and accessory genomes for trafficking, secretion, and defense-associated genes. Third, genes in core genomes with the highest sequence diversity are functionally diverse. Finally, certain protein domains are consistently mutation enriched across multiple species, especially among aminoacyl-tRNA synthetases where the extent of a domain’s mutation enrichment is strongly function-dependent.ConclusionsThese results illustrate the value of each resolution at uncovering distinct aspects in the relationship between genetic and functional diversity across multiple species. With the continued growth of the number of sequenced genomes, these methods will reveal additional universal patterns of genetic diversity at the pangenome scale.

Highlights

  • With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species

  • Openness is most commonly estimated as the power law exponent when fitting Heaps’ law to pangenome size versus number of genomes, across many iterations of randomly shuffling genome order [6]. This application of Heaps’ Law is based on its original discovery in linguistics as an empirical relationship between the number of unique words encountered and the number of documents reviewed, for which an analogous relationship between genes encountered and genomes sequenced has been observed for multiple bacterial pangenomes [6, 11]

  • multilocus sequence type (MLST) classification revealed that the genomes available for some species were highly biased for one or a few subtypes (i.e. 75% of E. faecium genomes are from MLST 80), while others were more diverse (Fig. 1b, Fig. S1b)

Read more

Summary

Introduction

With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. Few studies describe methods for comparing distinct pangenomes beyond the sizes of core or accessory genomes: Since Tettelin et al introduced the bacterial pangenome and Heaps’ Law as a model for quantifying and comparing openness [6], other multipangenome works have compared pangenome openness estimates using alternate models beyond Heaps’ Law [11], level of conservation within core genomes [12, 13], extent of functional characterization in core and pangenomes [14], and functional distributions between core and accessory genomes of different species or environmental isolates [11, 12, 15, 16] These methods focus primarily on pangenome scaling or the distribution of gene-level functions and are limited in their analysis of finer genetic variation such as individual sequence variants often examined in single pangenome studies. Existing pangenome studies often present a tradeoff between “scale” (number of species, genomes, or pangenomes analyzed) and “resolution” (smallest unit of genetic diversity analyzed)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.