Abstract
The availability of large metagenomic data offers great opportunities for the population genomic analysis of uncultured organisms, which represent a large part of the unexplored biosphere and play a key ecological role. However, the majority of these organisms lack a reference genome or transcriptome, which constitutes a technical obstacle for classical population genomic analyses. We introduce the metavariant species (MVS) model, in which a species is represented only by intra-species nucleotide polymorphism. We designed a method combining reference-free variant calling, multiple density-based clustering and maximum-weighted independent set algorithms to cluster intra-species variants into MVSs directly from multisample metagenomic raw reads without a reference genome or read assembly. The frequencies of the MVS variants are then used to compute population genomic statistics such as FST, in order to estimate genomic differentiation between populations and to identify loci under natural selection. The MVS construction was tested on simulated and real metagenomic data. MVSs showed the required quality for robust population genomics and allowed an accurate estimation of genomic differentiation (ΔFST < 0.0001 and <0.03 on simulated and real data respectively). Loci predicted under natural selection on real data were all detected by MVSs. MVSs represent a new paradigm that may simplify and enhance holistic approaches for population genomics and the evolution of microorganisms.
Highlights
Thanks to advances in deep sequencing and metagenomics, microorganism genomic resources have become more widely available over the last two decades
Metavariant species as a new modelling of organisms from metagenomic data In the absence of a reference genome to guide metagenomic data analyses for population genomics, we model species only by their variable loci
They are characterized by their associated depths of coverage and variant frequencies across environmental samples. We called this model metavariant species or MVS, and we proposed a method for constructing MVSs from multisample raw metagenomic data (Fig 1)
Summary
Thanks to advances in deep sequencing and metagenomics, microorganism genomic resources have become more widely available over the last two decades. By analyzing community assemblies containing a large number of uncultured species [1] we are gaining a better understanding of microbial ecology. This is especially the case for marine, soil and gut microbiomes that have been intensively investigated thanks to large sequencing consortia like Tara Oceans [2, 3], TerraGenome [4] or MetaHit [5].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have